Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: is hydra now stronger than deep blue?

Author: Dann Corbit

Date: 10:31:19 05/25/05

Go up one level in this thread


On May 25, 2005 at 12:58:46, Roger D Davis wrote:

>On May 25, 2005 at 05:35:14, emerson tan wrote:
>
>>s hydra now stronger than deep blue?
>
>We know Kasparov, even then, was a much stronger player than Adams is today. If
>Hydra, supposedly stronger than Deep Blue, loses to a much weaker player, then
>that provides a strong argument that Hydra is weaker than Deep Blue.

In a short match, anything can happen.  The error bar for the Elo calculation
will be nearly a thousand Elo for a match of this length.  In short, it tells us
almost nothing about who is stronger.  It does tell us who is the winner, and
that is about all.

>On the other hand, if Adams loses, then it says nothing about Hydra's strength
>relative to Deep Blue.

We get about the same amount of information either way.

>I guess you could always argue that Deep Blue can beat Kasparov and Kasparov can
>beat Adams and Adams can beat Hydra and Hydra can beat Deep Blue, but it doesn't
>seem likely. Particularly if Adams can get a convincing score.

In both cases, the experiments are very simple.  A single contest utilizing only
two opponents only tells you about the two combatants relative to each other.
Consider the SSDF ProDeo/Shredder match going on right now.  ProDeo is taking a
real butt-whupping.  But the single contest is not truly indicitive of ProDeo's
strength.  It's just a bad matchup for ProDeo.

In a similar vein, you need a broad spectrum of opponents to get a good gague of
strength for a chess player (man or machine) in order to make a logical
judgement about how strong they are.

Consider a contest with 26 games by 14 different programs against each other:
   Program          Elo    +   -   Games   Score   Av.Op.  Draws
 1 Shredder 9     : 2793  108 138    26    76.9 %   2584   30.8 %
 2 Gandalf 6.01   : 2760  113 170    26    73.1 %   2586   15.4 %
 3 Toga II 0.93   : 2688  125 112    26    63.5 %   2592   34.6 %
 4 List 5.12      : 2662  130 146    26    59.6 %   2594   11.5 %
 5 Ruffian 2.1.0  : 2636  137  83    26    55.8 %   2596   50.0 %
 6 Spike 0.9      : 2611  143 101    26    51.9 %   2598   34.6 %
 7 Deep Sjeng 1.6 : 2586  101 143    26    48.1 %   2600   34.6 %
 8 Zappa 1.0      : 2574  118 140    26    46.2 %   2601   23.1 %
 9 Ktulu 7.0      : 2561  104 137    26    44.2 %   2602   34.6 %
10 Pharaon 3.2    : 2549  112 133    26    42.3 %   2603   30.8 %
11 Fruit 2.0      : 2536  108 130    26    40.4 %   2604   34.6 %
12 Yace 0.99.87   : 2523   91 128    26    38.5 %   2605   46.2 %
13 Patriot 1.3.0  : 2468  143 117    26    30.8 %   2609   23.1 %
14 LambChop 10.99 : 2453  138 115    26    28.8 %   2610   26.9 %

Notice that the error bars are about 200 Elo wide even with 26 games and with 14
different opponents.

With a single opponent and nine games, the error bar is 597 Elo:

   Program          Elo    +   -   Games   Score   Av.Op.  Draws
 1 Rebel12_Cb     : 2640  236 361     9    83.3 %   2360   11.1 %
 2 Ruffian 1.0.5  : 2360  361 236     9    16.7 %   2640   11.1 %

Essentially, we cannot tell anything imporant about strength from this match
except that Rebel12_Cb is more likely to be stronger than Ruffian 1.0.5 than the
reverse situation.  But even that is very tenuous, given the data used to
compile it.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.