Author: Christophe Theron
Date: 08:47:43 12/19/00
Go up one level in this thread
On December 18, 2000 at 22:12:57, Robert Hyatt wrote:
>On December 18, 2000 at 21:04:37, Christophe Theron wrote:
>
>>On December 18, 2000 at 17:43:43, Severi Salminen wrote:
>>
>>>On December 18, 2000 at 10:48:49, Jorge Pichard wrote:
>>>
>>>>On December 18, 2000 at 09:55:42, Severi Salminen wrote:
>>>>
>>>>>>I agree with you that 24 games isn't enough, but 200 games is not really
>>>>>>necessary if one of the two programs reach a difference of over 7 games, in
>>>>>>which at that point I will stop the match. More likely this won't happen since
>>>>>>these two programs are too evenly match so far.
>>>>>
>>>>>I don't understand. Where do you get that 7? Are you saying that the result
>>>>>104-96 is significant? Or, even worse, 16-8 (this means nothing in practice)?
>>>>>Why not 8, 25 or 10056? I think there is no point to stop when difference is
>>>>>something. There _is_ a point to run a match with many games (500+). The closer
>>>>>the two programs are the more games you need to show the true difference. Also
>>>>>the learning abilities of both programs have to be taken in account. The chess
>>>>>community still seems to lack the knowledge on how to measure the strenght
>>>>>difference between two programs...
>>>>>
>>>>>Severi
>>>>
>>>>Okay I will run this tourney up to 200 games, and will post the result as soon
>>>>as the tourney is over, or will Email the PGN games to anybody interested.
>>>
>>>That begins to sound interesting. 200 games match still has some error margins
>>>but we'll see a lot from that result. I'm looking forward for the results - not
>>>too often someone runs a 200+ match here in CCC, thanks!
>>>
>>>Severi
>>
>>
>>
>>On 200 games, the margin of error for 80% reliability is +/-3.5%.
>>For 70% reliability it's +/-3.0%.
>>
>>If a program wins the 200 games match by 53.5% (107-93) or more, you can say
>>with 80% relability that it is stronger than its opponent.
>>
>>If it wins by only 53% (106-94) you can say it is better, but only with 70%
>>reliability.
>>
>>You see that when the programs are very close you need a very large number of
>>games to determine which is the best.
>>
>>On the other hand, if there is a significant difference before you reach 200
>>games, it is possible to say which is the best without playing the 200 games.
>>
>>
>>
>> Christophe
>
>
>The problem is defining "significant". To see why, take the results of a
>200 game match, and define a string of 0's and 1's such that 0 means program
>X lost, and 1 means it won. Then look at the largest number of consecutive
>1's or 0's, and the result will be alarming. IE in the match posted, at one
>point Nimzo was 3 points ahead of Crafty, yet it ended dead even. I wouldn't
>be surprised to see one program 10 ahead or behind in a 200 game match, and
>still the match finishes in a dead heat.
That's right Bob, and that's why I have mentionned the numbers for 80% and 70%
reliability for example.
"If a program wins the 200 games match by 53.5% (107-93) or more, you can say
with 80% relability that it is stronger than its opponent."
Winning a 200 games match by 7 games means that there is 80% chances that the
program is better than its opponent. But when it happens, there is still 20% of
chances that it is not better.
This accounts for the case you mention.
For 95% reliability, the number of games to play would increase dramatically,
and still the match would not allow to be absolutely sure...
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.