Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistics and Test results

Author: Rick Bischoff

Date: 04:30:46 10/07/04

Go up one level in this thread


>Calculate the net Score  S=W-L and the number of Results  R=W+L.
>Now calculate T=abs(S/sqrt(R)).
>
>If two programs are equal strength, then 95% of test runs will have T<2, and
>99.9% of test runs will have T<3. So if T>3 it's unlikely to have happened by
>chance. Even T>2 is pretty good. Less than 2 tells you you don't have enough
>games.
>

You are assuming that game results are randomly distributed AND you are assuming
they will follow some distribution (though, I really have no idea idea what the
distribution ABS(S/Sqrt(R)) is).

The distribution thing might be right, but you have provided no evidence to that
effect, so the entire test is suspect.

However, since the sample isn't random, the entire test is meaningless.

Side note:  If you could test engines like that, you would use the binomial
distribution and would need more than 30 random games from those engines to
properly test the probability of one engine winning over another.  However,
since it is not really possible to get a "random game", you will need to play,
as others on this board have suggested, you will need to increase the sample
size to 1000 or so.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.