Author: Rick Bischoff
Date: 04:30:46 10/07/04
Go up one level in this thread
>Calculate the net Score S=W-L and the number of Results R=W+L. >Now calculate T=abs(S/sqrt(R)). > >If two programs are equal strength, then 95% of test runs will have T<2, and >99.9% of test runs will have T<3. So if T>3 it's unlikely to have happened by >chance. Even T>2 is pretty good. Less than 2 tells you you don't have enough >games. > You are assuming that game results are randomly distributed AND you are assuming they will follow some distribution (though, I really have no idea idea what the distribution ABS(S/Sqrt(R)) is). The distribution thing might be right, but you have provided no evidence to that effect, so the entire test is suspect. However, since the sample isn't random, the entire test is meaningless. Side note: If you could test engines like that, you would use the binomial distribution and would need more than 30 random games from those engines to properly test the probability of one engine winning over another. However, since it is not really possible to get a "random game", you will need to play, as others on this board have suggested, you will need to increase the sample size to 1000 or so.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.