Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Spike 1.0 Mainz is too strong for Zappa 1.1 so far 16 to 10

Author: Peter Berger

Date: 09:27:52 08/30/05

Go up one level in this thread


On August 30, 2005 at 12:21:20, Maurizio De Leo wrote:

>
>>Under valid and controlled conditions it still seems logical to me to stop a
>>test after a 5-0 result and conclude that the winning program is probably the
>>stronger one.
>
>>>I don't put much credence in any result of less than 30 games.
>>>After 30 games, then you get a lot more plausibility.
>
>>You didn't give any reason for this, so I don't understand. A 6-0 says more
>>about engine strength than the above match result with over 100000 games.
>
>Dann is right, I think.
>The confidence interval calculation assumes that the score of a game is a
>statistic variable with a mean value between 1 and -1 (function of the Elo
>difference between the programs) and a standard deviation. Then if the
>experiments are independent, the sum of the points will approximate the product
>(mean*number of games) with a smaller standard deviation the more the games are.
>With enough games the "confidence" will get to 95% when the performance
>difference between the two programs is more than 3 standard deviations.
>However this assumes a normal distribution. The assumption can be made for any
>repeated statistical variable as long as the experiments are independent and
>"enough". This "enough" is indeed expressed in most statistics books as 30.
>
>Maurizio

Please have a look at "WhoisBest.zip" at Rémi Coulom's Home Page:
http://remi.coulom.free.fr/. It includes a little paper Whoisbest.pdf on
"Statistical Significance of a Match" , with a very straightforward mathematical
proof that for example the number of draws is irrelevant to conclude who is
better in a chessmatch .

Peter



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.