Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistics and Test results

Author: Rick Bischoff

Date: 22:10:59 10/07/04

Go up one level in this thread


I did some more research on this--

We are assuming here that an engine's win percentage follows a binomial
distribution with probability p. meaning that, out of "n" games played (with no
draws), we can expect the engine to win n*p games on average.

For estimating a population proportion, which for this case, is the win
percentage, the sample size needed is given by

n >= (z(a/2) / 2d) ^ 2

Where z(a/2) is a critical value of the standard normal distribution;  for 95%
confidence: z(0.975)=1.959964 and z(0.995)=2.575829 for 99% confidence.  "d" is
the number of units we wish to be off by, so-- let's say we want to be within 5
percantage points of an engines true "win" percentage against another engine
with 99% confidence:

n >= ( 2.575829 / 0.10 ) ^2 = 663.4895 (or 664)

For 95% confidence:

n >= 384.1459 (or 385)

That being said, we can still get a "decent" approximation with less games:

Say engine X plays engine Y,  and the result is +26/-4 and we wish to estimate
the true win proportion with 95% confidence:

0) n = 30 (26 wins, 4 losses)
1) calculate phat (the sample population proportion statistic) phat = 26 / 30 =
0.866667
2) calculate critical value for standard normal-- we are using 95% confidence,
so we use the 1.959964 value.
3) the interval is then given approximately by:

phat +/- z(a/2) sqrt (  phat * (1-phat) / n )

So in this case:

0.866667 +/- 1.959964 * 0.06206329 = 0.866667 +/- 0.1216418 = [0.7450252,
0.9883088]

To further illustrate, let's say we ran another test against two different
engines and get the results

+9/-12

0) n = 21
1) phat = 9/21 = 0.4285714
2) z(0.975) = 1.959964
3)

Same calculations as above with our new values gives an interval of (0.2169151,
0.6402277).  Note that this interval includes p=0.5, so we can say with 95%
confidence there is no signifigant difference between these engines playing
strengths.

Let's say we ran another test with the same engines and then got the result

+90/-120

0) n = 210
1) phat = 0.4285714
2) z(0.975)
3)

Same calculations as above, but now our confidence interval is (0.3615526,
0.4955902).  This does not include p=0.5, so we can say with 95% confidence now
that the engine with 120 wins is stronger.

Source:
Introduction to Probability and Statistics for Scientists and Engineers, by
Walter A. Rosenkrantz.
pg 266-269




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.