Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Spike 1.0 Mainz is too strong for Zappa 1.1 so far 16 to 10

Author: Maurizio De Leo

Date: 09:21:20 08/30/05

Go up one level in this thread



>Under valid and controlled conditions it still seems logical to me to stop a
>test after a 5-0 result and conclude that the winning program is probably the
>stronger one.

>>I don't put much credence in any result of less than 30 games.
>>After 30 games, then you get a lot more plausibility.

>You didn't give any reason for this, so I don't understand. A 6-0 says more
>about engine strength than the above match result with over 100000 games.

Dann is right, I think.
The confidence interval calculation assumes that the score of a game is a
statistic variable with a mean value between 1 and -1 (function of the Elo
difference between the programs) and a standard deviation. Then if the
experiments are independent, the sum of the points will approximate the product
(mean*number of games) with a smaller standard deviation the more the games are.
With enough games the "confidence" will get to 95% when the performance
difference between the two programs is more than 3 standard deviations.
However this assumes a normal distribution. The assumption can be made for any
repeated statistical variable as long as the experiments are independent and
"enough". This "enough" is indeed expressed in most statistics books as 30.

Maurizio



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.