Author: Vasik Rajlich
Date: 05:53:56 08/31/05
Go up one level in this thread
On August 31, 2005 at 06:22:49, Peter Berger wrote: >On August 31, 2005 at 04:52:11, Vasik Rajlich wrote: > >>On August 30, 2005 at 12:27:52, Peter Berger wrote: >> >>>On August 30, 2005 at 12:21:20, Maurizio De Leo wrote: >>> >>>> >>>>>Under valid and controlled conditions it still seems logical to me to stop a >>>>>test after a 5-0 result and conclude that the winning program is probably the >>>>>stronger one. >>>> >>>>>>I don't put much credence in any result of less than 30 games. >>>>>>After 30 games, then you get a lot more plausibility. >>>> >>>>>You didn't give any reason for this, so I don't understand. A 6-0 says more >>>>>about engine strength than the above match result with over 100000 games. >>>> >>>>Dann is right, I think. >>>>The confidence interval calculation assumes that the score of a game is a >>>>statistic variable with a mean value between 1 and -1 (function of the Elo >>>>difference between the programs) and a standard deviation. Then if the >>>>experiments are independent, the sum of the points will approximate the product >>>>(mean*number of games) with a smaller standard deviation the more the games are. >>>>With enough games the "confidence" will get to 95% when the performance >>>>difference between the two programs is more than 3 standard deviations. >>>>However this assumes a normal distribution. The assumption can be made for any >>>>repeated statistical variable as long as the experiments are independent and >>>>"enough". This "enough" is indeed expressed in most statistics books as 30. >>>> >>>>Maurizio >>> >>>Please have a look at "WhoisBest.zip" at RĂ©mi Coulom's Home Page: >>>http://remi.coulom.free.fr/. It includes a little paper Whoisbest.pdf on >>>"Statistical Significance of a Match" , with a very straightforward mathematical >>>proof that for example the number of draws is irrelevant to conclude who is >>>better in a chessmatch . >>> >>>Peter >> >>It's not that simple, due to the nature of chess. >> >>In chess, a match result of 2-0 with 0 draws is less significant than a match >>result of 2-0 with 8 draws. >> >>WhoIsBest makes the assumption that draws are independent events - that is, that >>wins, losses and draws each come with some independent probability. In fact, in >>a +2 -0 =8 result, the chance is that the side with the +2 was "stronger" in the >>draws - ie. closer to winning. Chess has this phenomenon where the stronger side >>tries to break through the draw barrier, and sometimes cannot. >> >>Of course to model this mathematically would be a huge mess. >> >>Vas > >No, that's a misunderstanding. > >The only assumption that is made is that the results get drawn independently >from an unknown probability distribution. > >So it doesn't matter *at all* how drawish chess itself is e.g. . And the result >will be the same whether the game is tic-tac-toe, checkers or chess. > >Unless you want to argue that there should be a distinction between drawn games, >depending on how close one side got to winning. But that's a completely >different topic. > >Peter Ok - consider the following scenario: Two players are playing basketball. The stronger player has some >50% chance to score each basket. The game ends when one player scores 50 points. Once the game is finished, a win by a margin of under 25 points is declared a draw, while a win by >25 points is declared a win. The question is: in this case, is a 2-0 result with 8 draws more significant than 2-0 with 0 draws? Vas
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.