Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: If 75 Games are not considered a Statistical proof, neither is the SSDF.

Author: Bruce Moreland

Date: 15:43:12 01/30/01

Go up one level in this thread


On January 30, 2001 at 17:42:59, Dann Corbit wrote:

>Additional measurements will not (in general) make the answer less accurate
>(unless something is wrong with the measurements).

If A is 1000 Elo points stronger than B, you will probably have a more accurate
answer after 20 games than you will after 100 games if A is 1 Elo point stronger
than B, and you get close to the expected 50-50 result.

It's not just number of games, another major factor is actual relative strength,
compared against the strength of the assertion you are trying to prove.

If you are trying to prove that A is no worse than 1000 Elo points worse than B,
it will almost certainly be very easy to confidently make this assertion after
20 games, if the two programs are the same strength.  If A really is about 1000
points worse than B, it will be harder.

"A is stronger than B" can be a very weak claim, or it can be a very strong one.
 That is why there is no fixed amount of games necessary to prove this, it
depends upon the actual Elo difference as measured by the match.

Of course, if you ran 500 games you could certainly make a claim that the
difference can't be too far from what you have measured.  If you get 252-248 you
can't declare that A is better than B, but you can certainly declare that A is
not likely to be much worse than B.

At the risk of being repetitive, the difficulty of proving an assertion about
the strength of two programs, seems to be very dependent upon the degree to
which the assertion rides the razor edge of truth and falsity.  If it's just
barely true, you may never prove it.

bruce

>However, if two programs are
>about equal, you will [basically] never determine which is stronger by playing
>them against each other.  For anyone who would like to prove this to themselves,
>just play a program against itself 10 times, 50 times, 100 times and 1000 times.
> The figure *should* [obviously] hover around 50% points scored for each side.
>It is very unlikely that the ten game match will be close to 50%.  The 100 game
>match will probably be fairly close.  It is rather unlikely that the 1000 game
>match will be far from 50%, but it is very unlikely it will be exactly 50%.  In
>fact, if it should be exactly 50%, the Chi-Squared Test will reject it!  It
>throws out both things that don't seem to fit the model and also things that fit
>so perfectly something looks fishy.
>;-)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.