Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: If 75 Games are not considered a Statistical proof, neither is the SSDF.

Author: Bruce Moreland

Date: 12:57:25 01/30/01

Go up one level in this thread


On January 30, 2001 at 09:06:09, Jorge Pichard wrote:

>Ever since I matched Nimzo 8 vs Junior 6 using my AMD K6-2 500 MHz and also
>matched them using my Athlon 800 MHz at G\60 and got different scores; some
>people argued that those games were not statistically significants to proof
>anything at all. Then we must disregard the SSDF rating list, since each Chess
>program only play 40 games against each other and not 200 games.
>
>PS: I am still convinced that Nimzo 8 is one of the few programs just like
>Gandalf 4.32 that benefit the most by using the best hardware available. And
>they are not programmed specifically to outperform Fritz 6 on a particular
>hardware such as the AMD K6-2 450 MHz.
>
>Pichard.

When you play a match you get some information.  You can see how the programs
played against each other, you know who won the match, and you know the score.

It's perfectly valid to say that A beat B, if A won the match.  That's a simple
fact.  And you can look at the match and say that A played better than B.
That's more subjective, but perhaps you are expert enough that what you say is
true.

You also have the score of the match.  If A beats B in a three-game match by a
score of 2-1, you have some data that you can use to make a judgement about how
A compares with B.  That A won the match is beyond question, but it is still an
open issue as to whether A is better than B in the completely true sense.

You can assert that A is better than B, but you can also assert that B is better
than A.  In this particular case, there is not a lot of difference between these
two assertions.  If two programs are equal, and they are known to draw 30% of
the time, the odds that one would beat the other by a score of 2-1 are almost
40%.  So it is more likely than not that A is better than B, assuming that A is
at least the tiniest bit better than B, but there's also a 40% chance that B is
at least a tiny bit better than A.

As you play more games, it is possible that you can make your "A is better than
B" conclusion with a higher chance of accuracy, but this is not necessarily
true.

If you play some games, and A completely wipes out B in terms of match score,
you can assert that A is better than B, with fairly good reliability, but if you
play a lot more games, and the score is close, your chance of accuracy may
actually be less.

In order to make a good claim that A is better than B, A needs to beat B by such
a score that the odds of the score being due to chance are quite low.  In
practice, this takes a lot of games, unless the match is a blowout.

The closer two programs are to each other in terms of strength, the more games
will probably be necessary in order to prove with reasonable accuracy that one
is at least a little bit better than the other.

There is no rule of thumb about how many games is enough, it depends completely
upon the score of the match.

When you talk about the SSDF list, that's a different thing.  The games are
played as a series of matches, but your score in the match doesn't determine
your position on the list, your total score against all opponents does.  It
would probably actually be better if there were many more opponents and the
matches were shorter, in terms of figuring out who deserves the top spot on the
list, if what you are trying to do is measure general chess strength.

bruce




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.