Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Spike 1.0 Mainz is too strong for Zappa 1.1 so far 16 to 10

Author: Peter Berger

Date: 10:52:09 09/01/05

Go up one level in this thread


On August 31, 2005 at 08:53:56, Vasik Rajlich wrote:

>On August 31, 2005 at 06:22:49, Peter Berger wrote:
>
>>On August 31, 2005 at 04:52:11, Vasik Rajlich wrote:
>>
>>>On August 30, 2005 at 12:27:52, Peter Berger wrote:
>>>
>>>>On August 30, 2005 at 12:21:20, Maurizio De Leo wrote:
>>>>
>>>>>
>>>>>>Under valid and controlled conditions it still seems logical to me to stop a
>>>>>>test after a 5-0 result and conclude that the winning program is probably the
>>>>>>stronger one.
>>>>>
>>>>>>>I don't put much credence in any result of less than 30 games.
>>>>>>>After 30 games, then you get a lot more plausibility.
>>>>>
>>>>>>You didn't give any reason for this, so I don't understand. A 6-0 says more
>>>>>>about engine strength than the above match result with over 100000 games.
>>>>>
>>>>>Dann is right, I think.
>>>>>The confidence interval calculation assumes that the score of a game is a
>>>>>statistic variable with a mean value between 1 and -1 (function of the Elo
>>>>>difference between the programs) and a standard deviation. Then if the
>>>>>experiments are independent, the sum of the points will approximate the product
>>>>>(mean*number of games) with a smaller standard deviation the more the games are.
>>>>>With enough games the "confidence" will get to 95% when the performance
>>>>>difference between the two programs is more than 3 standard deviations.
>>>>>However this assumes a normal distribution. The assumption can be made for any
>>>>>repeated statistical variable as long as the experiments are independent and
>>>>>"enough". This "enough" is indeed expressed in most statistics books as 30.
>>>>>
>>>>>Maurizio
>>>>
>>>>Please have a look at "WhoisBest.zip" at RĂ©mi Coulom's Home Page:
>>>>http://remi.coulom.free.fr/. It includes a little paper Whoisbest.pdf on
>>>>"Statistical Significance of a Match" , with a very straightforward mathematical
>>>>proof that for example the number of draws is irrelevant to conclude who is
>>>>better in a chessmatch .
>>>>
>>>>Peter
>>>
>>>It's not that simple, due to the nature of chess.
>>>
>>>In chess, a match result of 2-0 with 0 draws is less significant than a match
>>>result of 2-0 with 8 draws.
>>>
>>>WhoIsBest makes the assumption that draws are independent events - that is, that
>>>wins, losses and draws each come with some independent probability. In fact, in
>>>a +2 -0 =8 result, the chance is that the side with the +2 was "stronger" in the
>>>draws - ie. closer to winning. Chess has this phenomenon where the stronger side
>>>tries to break through the draw barrier, and sometimes cannot.
>>>
>>>Of course to model this mathematically would be a huge mess.
>>>
>>>Vas
>>
>>No, that's a misunderstanding.
>>
>>The only assumption that is made is that the results get drawn independently
>>from an unknown probability distribution.
>>
>>So it doesn't matter *at all* how drawish chess itself is e.g. . And the result
>>will be the same whether the game is tic-tac-toe, checkers or chess.
>>
>>Unless you want to argue that there should be a distinction between drawn games,
>>depending on how close one side got to winning. But that's a completely
>>different topic.
>>
>>Peter
>
>Ok - consider the following scenario:
>
>Two players are playing basketball. The stronger player has some >50% chance to
>score each basket. The game ends when one player scores 50 points. Once the game
>is finished, a win by a margin of under 25 points is declared a draw, while a
>win by >25 points is declared a win.
>
>The question is: in this case, is a 2-0 result with 8 draws more significant
>than 2-0 with 0 draws?
>
>Vas

No, it isn't more significant on answering the question who is the better
player.

Peter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.