Author: Dann Corbit
Date: 10:31:19 05/25/05
Go up one level in this thread
On May 25, 2005 at 12:58:46, Roger D Davis wrote: >On May 25, 2005 at 05:35:14, emerson tan wrote: > >>s hydra now stronger than deep blue? > >We know Kasparov, even then, was a much stronger player than Adams is today. If >Hydra, supposedly stronger than Deep Blue, loses to a much weaker player, then >that provides a strong argument that Hydra is weaker than Deep Blue. In a short match, anything can happen. The error bar for the Elo calculation will be nearly a thousand Elo for a match of this length. In short, it tells us almost nothing about who is stronger. It does tell us who is the winner, and that is about all. >On the other hand, if Adams loses, then it says nothing about Hydra's strength >relative to Deep Blue. We get about the same amount of information either way. >I guess you could always argue that Deep Blue can beat Kasparov and Kasparov can >beat Adams and Adams can beat Hydra and Hydra can beat Deep Blue, but it doesn't >seem likely. Particularly if Adams can get a convincing score. In both cases, the experiments are very simple. A single contest utilizing only two opponents only tells you about the two combatants relative to each other. Consider the SSDF ProDeo/Shredder match going on right now. ProDeo is taking a real butt-whupping. But the single contest is not truly indicitive of ProDeo's strength. It's just a bad matchup for ProDeo. In a similar vein, you need a broad spectrum of opponents to get a good gague of strength for a chess player (man or machine) in order to make a logical judgement about how strong they are. Consider a contest with 26 games by 14 different programs against each other: Program Elo + - Games Score Av.Op. Draws 1 Shredder 9 : 2793 108 138 26 76.9 % 2584 30.8 % 2 Gandalf 6.01 : 2760 113 170 26 73.1 % 2586 15.4 % 3 Toga II 0.93 : 2688 125 112 26 63.5 % 2592 34.6 % 4 List 5.12 : 2662 130 146 26 59.6 % 2594 11.5 % 5 Ruffian 2.1.0 : 2636 137 83 26 55.8 % 2596 50.0 % 6 Spike 0.9 : 2611 143 101 26 51.9 % 2598 34.6 % 7 Deep Sjeng 1.6 : 2586 101 143 26 48.1 % 2600 34.6 % 8 Zappa 1.0 : 2574 118 140 26 46.2 % 2601 23.1 % 9 Ktulu 7.0 : 2561 104 137 26 44.2 % 2602 34.6 % 10 Pharaon 3.2 : 2549 112 133 26 42.3 % 2603 30.8 % 11 Fruit 2.0 : 2536 108 130 26 40.4 % 2604 34.6 % 12 Yace 0.99.87 : 2523 91 128 26 38.5 % 2605 46.2 % 13 Patriot 1.3.0 : 2468 143 117 26 30.8 % 2609 23.1 % 14 LambChop 10.99 : 2453 138 115 26 28.8 % 2610 26.9 % Notice that the error bars are about 200 Elo wide even with 26 games and with 14 different opponents. With a single opponent and nine games, the error bar is 597 Elo: Program Elo + - Games Score Av.Op. Draws 1 Rebel12_Cb : 2640 236 361 9 83.3 % 2360 11.1 % 2 Ruffian 1.0.5 : 2360 361 236 9 16.7 % 2640 11.1 % Essentially, we cannot tell anything imporant about strength from this match except that Rebel12_Cb is more likely to be stronger than Ruffian 1.0.5 than the reverse situation. But even that is very tenuous, given the data used to compile it.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.