Author: Colin Frayn
Date: 04:32:45 05/09/01
Go up one level in this thread
On May 08, 2001 at 14:05:39, Gian-Carlo Pascutto wrote: >I can try to offer 2 solutions, but I don't know if they are good enough > >a) assume booklearning has no influence on the zero hypothesis, i.e. >that the learning of Junior and Fritz is equally good. This sounds >reasonable, but may not be correct. I would be _very_ surprised if book learning had a significant effect on the outcome, assuming that both programs have reasonably diverse and well-populated books. I've done experiments between programs with and without books and the ELO change isn't significant except in very fast time controls. In fact, having a book is occasionally a hindrance because (unless the book is well tuned) the program often ends up out of book in positions with which it isn't comfortable. Of course, with a really clever killer book this all changes... So, taking the above hypothesis, we can work out how significant any win is. Assuming that the chance of a win from either program is 50%, we can work out the expected number of wins. If the number of games N is large enough, this binomial distribution tends into a (continuous) Gaussian with a mean N/2 and a variance Np(1-p) = N*0.5*(1-0.5) = N/4. This gives a width of sqrt(N/4). In order to test if the null-hypothesis is false (if one engine is statistically better) then we need to carry out a number of games and test to see if the result of the number of wins from that machine is significantly larger than the mean expected from the null hypothesis. In 24 games, the standard deviation is sqrt(24/4) = sqrt(6) = 2.45. Therefore for me to believe that one engine is better at a 2 sigma level (i.e. 95% confidence) then I would look for a result of 17-7 or better. (A score 2.45*2 = roughly 5 better than the mean of 12). I must admit it annoys me when people write "Engine X is better than engine Y! 6.5-3.5 in a 10 game match!" That's not even a 1-sigma result. This kind of result could happen 1/3 of the time just by fluke. In fact, there's still a reasonably large chance that engine Y is in fact the stronger. Caveat - I've assumed that the events are totally random and that there are no external factors and that the tests are unrelated. I don't see why this should not be the case, but bear in mind that two engines without any books will always play the same two games.... Hope that helps. Cheers, Colin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.