Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Spike 1.0 Mainz is too strong for Zappa 1.1 so far 16 to 10

Author: Vasik Rajlich

Date: 05:58:10 08/31/05

Go up one level in this thread


On August 31, 2005 at 04:59:18, Uri Blass wrote:

>On August 31, 2005 at 04:52:11, Vasik Rajlich wrote:
>
>>On August 30, 2005 at 12:27:52, Peter Berger wrote:
>>
>>>On August 30, 2005 at 12:21:20, Maurizio De Leo wrote:
>>>
>>>>
>>>>>Under valid and controlled conditions it still seems logical to me to stop a
>>>>>test after a 5-0 result and conclude that the winning program is probably the
>>>>>stronger one.
>>>>
>>>>>>I don't put much credence in any result of less than 30 games.
>>>>>>After 30 games, then you get a lot more plausibility.
>>>>
>>>>>You didn't give any reason for this, so I don't understand. A 6-0 says more
>>>>>about engine strength than the above match result with over 100000 games.
>>>>
>>>>Dann is right, I think.
>>>>The confidence interval calculation assumes that the score of a game is a
>>>>statistic variable with a mean value between 1 and -1 (function of the Elo
>>>>difference between the programs) and a standard deviation. Then if the
>>>>experiments are independent, the sum of the points will approximate the product
>>>>(mean*number of games) with a smaller standard deviation the more the games are.
>>>>With enough games the "confidence" will get to 95% when the performance
>>>>difference between the two programs is more than 3 standard deviations.
>>>>However this assumes a normal distribution. The assumption can be made for any
>>>>repeated statistical variable as long as the experiments are independent and
>>>>"enough". This "enough" is indeed expressed in most statistics books as 30.
>>>>
>>>>Maurizio
>>>
>>>Please have a look at "WhoisBest.zip" at Rémi Coulom's Home Page:
>>>http://remi.coulom.free.fr/. It includes a little paper Whoisbest.pdf on
>>>"Statistical Significance of a Match" , with a very straightforward mathematical
>>>proof that for example the number of draws is irrelevant to conclude who is
>>>better in a chessmatch .
>>>
>>>Peter
>>
>>It's not that simple, due to the nature of chess.
>>
>>In chess, a match result of 2-0 with 0 draws is less significant than a match
>>result of 2-0 with 8 draws.
>
>I disagree.
>
>If you want only to answer the question who is best both results give the same
>information.
>
>>
>>WhoIsBest makes the assumption that draws are independent events - that is, that
>>wins, losses and draws each come with some independent probability. In fact, in
>>a +2 -0 =8 result, the chance is that the side with the +2 was "stronger" in the
>>draws - ie. closer to winning.
>
>It is not clear without looking at the games.
>
>
> Chess has this phenomenon where the stronger side
>>tries to break through the draw barrier, and sometimes cannot.
>
>It may also be the case that one side is better in the middle game and the other
>side is better in the endgame so the side that is better in the endgame often
>save inferior positions.
>
>Uri

Yes, stronger players also escape with draws, etc. Here we are talking purely
about statistics.

If you're really curious, try the following experiment: take a sample of drawn
games, evaluate the final positions (or some range of close-to-final positions),
and see if there is a correlation between having a higher rating and having
better positions.

I'm pretty sure that this correlation exists.

Vas




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.