Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Is there a statistician in the house?

Author: Colin Frayn

Date: 04:32:45 05/09/01

Go up one level in this thread


On May 08, 2001 at 14:05:39, Gian-Carlo Pascutto wrote:

>I can try to offer 2 solutions, but I don't know if they are good enough
>
>a) assume booklearning has no influence on the zero hypothesis, i.e.
>that the learning of Junior and Fritz is equally good. This sounds
>reasonable, but may not be correct.

I would be _very_ surprised if book learning had a significant effect on the
outcome, assuming that both programs have reasonably diverse and well-populated
books.  I've done experiments between programs with and without books and the
ELO change isn't significant except in very fast time controls.  In fact, having
a book is occasionally a hindrance because (unless the book is well tuned) the
program often ends up out of book in positions with which it isn't comfortable.
Of course, with a really clever killer book this all changes...

So, taking the above hypothesis, we can work out how significant any win is.

Assuming that the chance of a win from either program is 50%, we can work out
the expected number of wins.  If the number of games N is large enough, this
binomial distribution tends into a (continuous) Gaussian with a mean N/2 and a
variance Np(1-p) = N*0.5*(1-0.5) = N/4.  This gives a width of sqrt(N/4).

In order to test if the null-hypothesis is false (if one engine is statistically
better) then we need to carry out a number of games and test to see if the
result of the number of wins from that machine is significantly larger than the
mean expected from the null hypothesis.

In 24 games, the standard deviation is sqrt(24/4) = sqrt(6) = 2.45.  Therefore
for me to believe that one engine is better at a 2 sigma level (i.e. 95%
confidence) then I would look for a result of 17-7 or better. (A score 2.45*2 =
roughly 5 better than the mean of 12).

I must admit it annoys me when people write
"Engine X is better than engine Y! 6.5-3.5 in a 10 game match!"  That's not even
a 1-sigma result.  This kind of result could happen 1/3 of the time just by
fluke.  In fact, there's still a reasonably large chance that engine Y is in
fact the stronger.

Caveat - I've assumed that the events are totally random and that there are no
external factors and that the tests are unrelated.  I don't see why this should
not be the case, but bear in mind that two engines without any books will always
play the same two games....

Hope that helps.

Cheers,
Colin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.