Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: I will continue the match until there is a diffence of 7 games

Author: Robert Hyatt

Date: 19:12:57 12/18/00

Go up one level in this thread


On December 18, 2000 at 21:04:37, Christophe Theron wrote:

>On December 18, 2000 at 17:43:43, Severi Salminen wrote:
>
>>On December 18, 2000 at 10:48:49, Jorge Pichard wrote:
>>
>>>On December 18, 2000 at 09:55:42, Severi Salminen wrote:
>>>
>>>>>I agree with you that 24 games isn't enough, but 200 games is not really
>>>>>necessary if one of the two programs reach a difference of over 7 games, in
>>>>>which at that point I will stop the match. More likely this won't happen since
>>>>>these two programs are too evenly match so far.
>>>>
>>>>I don't understand. Where do you get that 7? Are you saying that the result
>>>>104-96 is significant? Or, even worse, 16-8 (this means nothing in practice)?
>>>>Why not 8, 25 or 10056? I think there is no point to stop when difference is
>>>>something. There _is_ a point to run a match with many games (500+). The closer
>>>>the two programs are the more games you need to show the true difference. Also
>>>>the learning abilities of both programs have to be taken in account. The chess
>>>>community still seems to lack the knowledge on how to measure the strenght
>>>>difference between two programs...
>>>>
>>>>Severi
>>>
>>>Okay I will run this tourney up to 200 games, and will post the result as soon
>>>as the tourney is over, or will Email the PGN games to anybody interested.
>>
>>That begins to sound interesting. 200 games match still has some error margins
>>but we'll see a lot from that result. I'm looking forward for the results - not
>>too often someone runs a 200+ match here in CCC, thanks!
>>
>>Severi
>
>
>
>On 200 games, the margin of error for 80% reliability is +/-3.5%.
>For 70% reliability it's +/-3.0%.
>
>If a program wins the 200 games match by 53.5% (107-93) or more, you can say
>with 80% relability that it is stronger than its opponent.
>
>If it wins by only 53% (106-94) you can say it is better, but only with 70%
>reliability.
>
>You see that when the programs are very close you need a very large number of
>games to determine which is the best.
>
>On the other hand, if there is a significant difference before you reach 200
>games, it is possible to say which is the best without playing the 200 games.
>
>
>
>    Christophe


The problem is defining "significant".  To see why, take the results of a
200 game match, and define a string of 0's and 1's such that 0 means program
X lost, and 1 means it won.  Then look at the largest number of consecutive
1's or 0's, and the result will be alarming.  IE in the match posted, at one
point Nimzo was 3 points ahead of Crafty, yet it ended dead even.  I wouldn't
be surprised to see one program 10 ahead or behind in a 200 game match, and
still the match finishes in a dead heat.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.