Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Spike 1.0 Mainz is too strong for Zappa 1.1 so far 16 to 10

Author: Uri Blass

Date: 01:59:18 08/31/05

Go up one level in this thread


On August 31, 2005 at 04:52:11, Vasik Rajlich wrote:

>On August 30, 2005 at 12:27:52, Peter Berger wrote:
>
>>On August 30, 2005 at 12:21:20, Maurizio De Leo wrote:
>>
>>>
>>>>Under valid and controlled conditions it still seems logical to me to stop a
>>>>test after a 5-0 result and conclude that the winning program is probably the
>>>>stronger one.
>>>
>>>>>I don't put much credence in any result of less than 30 games.
>>>>>After 30 games, then you get a lot more plausibility.
>>>
>>>>You didn't give any reason for this, so I don't understand. A 6-0 says more
>>>>about engine strength than the above match result with over 100000 games.
>>>
>>>Dann is right, I think.
>>>The confidence interval calculation assumes that the score of a game is a
>>>statistic variable with a mean value between 1 and -1 (function of the Elo
>>>difference between the programs) and a standard deviation. Then if the
>>>experiments are independent, the sum of the points will approximate the product
>>>(mean*number of games) with a smaller standard deviation the more the games are.
>>>With enough games the "confidence" will get to 95% when the performance
>>>difference between the two programs is more than 3 standard deviations.
>>>However this assumes a normal distribution. The assumption can be made for any
>>>repeated statistical variable as long as the experiments are independent and
>>>"enough". This "enough" is indeed expressed in most statistics books as 30.
>>>
>>>Maurizio
>>
>>Please have a look at "WhoisBest.zip" at Rémi Coulom's Home Page:
>>http://remi.coulom.free.fr/. It includes a little paper Whoisbest.pdf on
>>"Statistical Significance of a Match" , with a very straightforward mathematical
>>proof that for example the number of draws is irrelevant to conclude who is
>>better in a chessmatch .
>>
>>Peter
>
>It's not that simple, due to the nature of chess.
>
>In chess, a match result of 2-0 with 0 draws is less significant than a match
>result of 2-0 with 8 draws.

I disagree.

If you want only to answer the question who is best both results give the same
information.

>
>WhoIsBest makes the assumption that draws are independent events - that is, that
>wins, losses and draws each come with some independent probability. In fact, in
>a +2 -0 =8 result, the chance is that the side with the +2 was "stronger" in the
>draws - ie. closer to winning.

It is not clear without looking at the games.


 Chess has this phenomenon where the stronger side
>tries to break through the draw barrier, and sometimes cannot.

It may also be the case that one side is better in the middle game and the other
side is better in the endgame so the side that is better in the endgame often
save inferior positions.

Uri




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.