Author: Peter Berger
Date: 03:22:49 08/31/05
Go up one level in this thread
On August 31, 2005 at 04:52:11, Vasik Rajlich wrote: >On August 30, 2005 at 12:27:52, Peter Berger wrote: > >>On August 30, 2005 at 12:21:20, Maurizio De Leo wrote: >> >>> >>>>Under valid and controlled conditions it still seems logical to me to stop a >>>>test after a 5-0 result and conclude that the winning program is probably the >>>>stronger one. >>> >>>>>I don't put much credence in any result of less than 30 games. >>>>>After 30 games, then you get a lot more plausibility. >>> >>>>You didn't give any reason for this, so I don't understand. A 6-0 says more >>>>about engine strength than the above match result with over 100000 games. >>> >>>Dann is right, I think. >>>The confidence interval calculation assumes that the score of a game is a >>>statistic variable with a mean value between 1 and -1 (function of the Elo >>>difference between the programs) and a standard deviation. Then if the >>>experiments are independent, the sum of the points will approximate the product >>>(mean*number of games) with a smaller standard deviation the more the games are. >>>With enough games the "confidence" will get to 95% when the performance >>>difference between the two programs is more than 3 standard deviations. >>>However this assumes a normal distribution. The assumption can be made for any >>>repeated statistical variable as long as the experiments are independent and >>>"enough". This "enough" is indeed expressed in most statistics books as 30. >>> >>>Maurizio >> >>Please have a look at "WhoisBest.zip" at RĂ©mi Coulom's Home Page: >>http://remi.coulom.free.fr/. It includes a little paper Whoisbest.pdf on >>"Statistical Significance of a Match" , with a very straightforward mathematical >>proof that for example the number of draws is irrelevant to conclude who is >>better in a chessmatch . >> >>Peter > >It's not that simple, due to the nature of chess. > >In chess, a match result of 2-0 with 0 draws is less significant than a match >result of 2-0 with 8 draws. > >WhoIsBest makes the assumption that draws are independent events - that is, that >wins, losses and draws each come with some independent probability. In fact, in >a +2 -0 =8 result, the chance is that the side with the +2 was "stronger" in the >draws - ie. closer to winning. Chess has this phenomenon where the stronger side >tries to break through the draw barrier, and sometimes cannot. > >Of course to model this mathematically would be a huge mess. > >Vas No, that's a misunderstanding. The only assumption that is made is that the results get drawn independently from an unknown probability distribution. So it doesn't matter *at all* how drawish chess itself is e.g. . And the result will be the same whether the game is tic-tac-toe, checkers or chess. Unless you want to argue that there should be a distinction between drawn games, depending on how close one side got to winning. But that's a completely different topic. Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.