Author: Vasik Rajlich
Date: 05:58:10 08/31/05
Go up one level in this thread
On August 31, 2005 at 04:59:18, Uri Blass wrote: >On August 31, 2005 at 04:52:11, Vasik Rajlich wrote: > >>On August 30, 2005 at 12:27:52, Peter Berger wrote: >> >>>On August 30, 2005 at 12:21:20, Maurizio De Leo wrote: >>> >>>> >>>>>Under valid and controlled conditions it still seems logical to me to stop a >>>>>test after a 5-0 result and conclude that the winning program is probably the >>>>>stronger one. >>>> >>>>>>I don't put much credence in any result of less than 30 games. >>>>>>After 30 games, then you get a lot more plausibility. >>>> >>>>>You didn't give any reason for this, so I don't understand. A 6-0 says more >>>>>about engine strength than the above match result with over 100000 games. >>>> >>>>Dann is right, I think. >>>>The confidence interval calculation assumes that the score of a game is a >>>>statistic variable with a mean value between 1 and -1 (function of the Elo >>>>difference between the programs) and a standard deviation. Then if the >>>>experiments are independent, the sum of the points will approximate the product >>>>(mean*number of games) with a smaller standard deviation the more the games are. >>>>With enough games the "confidence" will get to 95% when the performance >>>>difference between the two programs is more than 3 standard deviations. >>>>However this assumes a normal distribution. The assumption can be made for any >>>>repeated statistical variable as long as the experiments are independent and >>>>"enough". This "enough" is indeed expressed in most statistics books as 30. >>>> >>>>Maurizio >>> >>>Please have a look at "WhoisBest.zip" at Rémi Coulom's Home Page: >>>http://remi.coulom.free.fr/. It includes a little paper Whoisbest.pdf on >>>"Statistical Significance of a Match" , with a very straightforward mathematical >>>proof that for example the number of draws is irrelevant to conclude who is >>>better in a chessmatch . >>> >>>Peter >> >>It's not that simple, due to the nature of chess. >> >>In chess, a match result of 2-0 with 0 draws is less significant than a match >>result of 2-0 with 8 draws. > >I disagree. > >If you want only to answer the question who is best both results give the same >information. > >> >>WhoIsBest makes the assumption that draws are independent events - that is, that >>wins, losses and draws each come with some independent probability. In fact, in >>a +2 -0 =8 result, the chance is that the side with the +2 was "stronger" in the >>draws - ie. closer to winning. > >It is not clear without looking at the games. > > > Chess has this phenomenon where the stronger side >>tries to break through the draw barrier, and sometimes cannot. > >It may also be the case that one side is better in the middle game and the other >side is better in the endgame so the side that is better in the endgame often >save inferior positions. > >Uri Yes, stronger players also escape with draws, etc. Here we are talking purely about statistics. If you're really curious, try the following experiment: take a sample of drawn games, evaluate the final positions (or some range of close-to-final positions), and see if there is a correlation between having a higher rating and having better positions. I'm pretty sure that this correlation exists. Vas
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.