Author: Rick Bischoff
Date: 22:10:59 10/07/04
Go up one level in this thread
I did some more research on this-- We are assuming here that an engine's win percentage follows a binomial distribution with probability p. meaning that, out of "n" games played (with no draws), we can expect the engine to win n*p games on average. For estimating a population proportion, which for this case, is the win percentage, the sample size needed is given by n >= (z(a/2) / 2d) ^ 2 Where z(a/2) is a critical value of the standard normal distribution; for 95% confidence: z(0.975)=1.959964 and z(0.995)=2.575829 for 99% confidence. "d" is the number of units we wish to be off by, so-- let's say we want to be within 5 percantage points of an engines true "win" percentage against another engine with 99% confidence: n >= ( 2.575829 / 0.10 ) ^2 = 663.4895 (or 664) For 95% confidence: n >= 384.1459 (or 385) That being said, we can still get a "decent" approximation with less games: Say engine X plays engine Y, and the result is +26/-4 and we wish to estimate the true win proportion with 95% confidence: 0) n = 30 (26 wins, 4 losses) 1) calculate phat (the sample population proportion statistic) phat = 26 / 30 = 0.866667 2) calculate critical value for standard normal-- we are using 95% confidence, so we use the 1.959964 value. 3) the interval is then given approximately by: phat +/- z(a/2) sqrt ( phat * (1-phat) / n ) So in this case: 0.866667 +/- 1.959964 * 0.06206329 = 0.866667 +/- 0.1216418 = [0.7450252, 0.9883088] To further illustrate, let's say we ran another test against two different engines and get the results +9/-12 0) n = 21 1) phat = 9/21 = 0.4285714 2) z(0.975) = 1.959964 3) Same calculations as above with our new values gives an interval of (0.2169151, 0.6402277). Note that this interval includes p=0.5, so we can say with 95% confidence there is no signifigant difference between these engines playing strengths. Let's say we ran another test with the same engines and then got the result +90/-120 0) n = 210 1) phat = 0.4285714 2) z(0.975) 3) Same calculations as above, but now our confidence interval is (0.3615526, 0.4955902). This does not include p=0.5, so we can say with 95% confidence now that the engine with 120 wins is stronger. Source: Introduction to Probability and Statistics for Scientists and Engineers, by Walter A. Rosenkrantz. pg 266-269
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.