Author: Walter Koroljow
Date: 09:13:31 02/06/01
Go up one level in this thread
On February 05, 2001 at 22:39:18, Walter Koroljow wrote:
>Thank you for the detailed reply. I didn't know what the real problem was. But
>maybe I can help reconcile the two cases you talked about and take a small chip
>out of the general case. As you did, let's forget about draws. Let's compare
>the probability of getting 60-40, say, with the stronger program to the
>probability of getting it with the weaker program. In general, a very good
>simple approximation to the answer is:
>
>Ratio = exp(8*eps*(x-N/2))
>
>Where:
>
>x = program score,
>N = number of games played,
>eps = We - 0.5 where We = win expectancy.
>
>This approximation is excellent as long as eps is not too big and as long as N
>is on the order of 10 or more. Let us try a few cases:
>
>N = 100, x = 60, We = 0.55, eps = 0.05.
>
>Then Ratio = exp (8*0.05*(60-50)) = exp (4) = 54.6.
>
>But, with the usual approach, using the (60,40) binomial term instead, canceling
>some terms in the numerator and denominator, we get
>
>Ratio = (.55/.45)^20 = 55.33. A 1.3% error.
>
>But the approximation becomes extremely accurate as We approaches 0.5. Try:
>
>N=100, x = 60, We = 0.51, eps = .01.
>
>Ratio = exp(0.8) = 2.2255. Compare this to (.51/.49)^20 = 2.2258.
>
>This approximation means that two test results are equivalent as long as they
>have the same value of eps*(x-N/2), since then the chance of confusing the two
>programs is the same for the two results. So here are some equivalent test
>results with a probability of confusing the two programs at 8%:
>
>N x We
>10 10 .56
>20 15 .56
>20 20 .53
>60 40 .53
>60 60 .51
>100 60 .53
>
>It is hard to measure small differences!
>
>For the record, the approximation is derived by approximating the binomial
>distribution of scores with a normal distribution and then using it to calculate
>the ratio of the two probabilities for getting the observed result with the two
>programs. I can supply more detail if anyone is interested.
>
>Good luck,
>
>Walter
Looking back, I notice that I used x-N/2. This is all easier to understand if
instead we use x-N/2 = (W-L)/2 where W = wins and L = losses. Then the formula
becomes:
Ratio = exp(4*eps*(W-L)) where:
W = wins
L = losses
eps = We - 0.5, where We = win expectancy
Ratio = ratio of the two probabilities of getting the result with the two
programs.
Uri Blass first pointed out the importance of W-L in another post.
Now we can have an easier to understand table of equivalent test results which
give a probability of confusion of 8%:
Wins - Losses Win expectancy
------------- --------------
5 .56
10 .53
20 .515
30 .51
60 .505
In all cases, the number of games played does not matter, again first pointed
out by Uri. How is that for counterintuitive?
Of course, this is no longer true for large We where the formula becomes
inaccurate. However, in tests, the formula is pretty accurate for We = .6, and
even sometimes for We = 0.65. For example, for the 10-0 case with We = 0.6, the
formula gives 54.6 versus the correct 57.7, and for We = 0.65, the formula gives
403 versus the correct 488.
Cheers,
Walter
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.