Author: Peter Berger
Date: 10:52:09 09/01/05
Go up one level in this thread
On August 31, 2005 at 08:53:56, Vasik Rajlich wrote: >On August 31, 2005 at 06:22:49, Peter Berger wrote: > >>On August 31, 2005 at 04:52:11, Vasik Rajlich wrote: >> >>>On August 30, 2005 at 12:27:52, Peter Berger wrote: >>> >>>>On August 30, 2005 at 12:21:20, Maurizio De Leo wrote: >>>> >>>>> >>>>>>Under valid and controlled conditions it still seems logical to me to stop a >>>>>>test after a 5-0 result and conclude that the winning program is probably the >>>>>>stronger one. >>>>> >>>>>>>I don't put much credence in any result of less than 30 games. >>>>>>>After 30 games, then you get a lot more plausibility. >>>>> >>>>>>You didn't give any reason for this, so I don't understand. A 6-0 says more >>>>>>about engine strength than the above match result with over 100000 games. >>>>> >>>>>Dann is right, I think. >>>>>The confidence interval calculation assumes that the score of a game is a >>>>>statistic variable with a mean value between 1 and -1 (function of the Elo >>>>>difference between the programs) and a standard deviation. Then if the >>>>>experiments are independent, the sum of the points will approximate the product >>>>>(mean*number of games) with a smaller standard deviation the more the games are. >>>>>With enough games the "confidence" will get to 95% when the performance >>>>>difference between the two programs is more than 3 standard deviations. >>>>>However this assumes a normal distribution. The assumption can be made for any >>>>>repeated statistical variable as long as the experiments are independent and >>>>>"enough". This "enough" is indeed expressed in most statistics books as 30. >>>>> >>>>>Maurizio >>>> >>>>Please have a look at "WhoisBest.zip" at RĂ©mi Coulom's Home Page: >>>>http://remi.coulom.free.fr/. It includes a little paper Whoisbest.pdf on >>>>"Statistical Significance of a Match" , with a very straightforward mathematical >>>>proof that for example the number of draws is irrelevant to conclude who is >>>>better in a chessmatch . >>>> >>>>Peter >>> >>>It's not that simple, due to the nature of chess. >>> >>>In chess, a match result of 2-0 with 0 draws is less significant than a match >>>result of 2-0 with 8 draws. >>> >>>WhoIsBest makes the assumption that draws are independent events - that is, that >>>wins, losses and draws each come with some independent probability. In fact, in >>>a +2 -0 =8 result, the chance is that the side with the +2 was "stronger" in the >>>draws - ie. closer to winning. Chess has this phenomenon where the stronger side >>>tries to break through the draw barrier, and sometimes cannot. >>> >>>Of course to model this mathematically would be a huge mess. >>> >>>Vas >> >>No, that's a misunderstanding. >> >>The only assumption that is made is that the results get drawn independently >>from an unknown probability distribution. >> >>So it doesn't matter *at all* how drawish chess itself is e.g. . And the result >>will be the same whether the game is tic-tac-toe, checkers or chess. >> >>Unless you want to argue that there should be a distinction between drawn games, >>depending on how close one side got to winning. But that's a completely >>different topic. >> >>Peter > >Ok - consider the following scenario: > >Two players are playing basketball. The stronger player has some >50% chance to >score each basket. The game ends when one player scores 50 points. Once the game >is finished, a win by a margin of under 25 points is declared a draw, while a >win by >25 points is declared a win. > >The question is: in this case, is a 2-0 result with 8 draws more significant >than 2-0 with 0 draws? > >Vas No, it isn't more significant on answering the question who is the better player. Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.