Author: Robert Hyatt
Date: 19:12:57 12/18/00
Go up one level in this thread
On December 18, 2000 at 21:04:37, Christophe Theron wrote: >On December 18, 2000 at 17:43:43, Severi Salminen wrote: > >>On December 18, 2000 at 10:48:49, Jorge Pichard wrote: >> >>>On December 18, 2000 at 09:55:42, Severi Salminen wrote: >>> >>>>>I agree with you that 24 games isn't enough, but 200 games is not really >>>>>necessary if one of the two programs reach a difference of over 7 games, in >>>>>which at that point I will stop the match. More likely this won't happen since >>>>>these two programs are too evenly match so far. >>>> >>>>I don't understand. Where do you get that 7? Are you saying that the result >>>>104-96 is significant? Or, even worse, 16-8 (this means nothing in practice)? >>>>Why not 8, 25 or 10056? I think there is no point to stop when difference is >>>>something. There _is_ a point to run a match with many games (500+). The closer >>>>the two programs are the more games you need to show the true difference. Also >>>>the learning abilities of both programs have to be taken in account. The chess >>>>community still seems to lack the knowledge on how to measure the strenght >>>>difference between two programs... >>>> >>>>Severi >>> >>>Okay I will run this tourney up to 200 games, and will post the result as soon >>>as the tourney is over, or will Email the PGN games to anybody interested. >> >>That begins to sound interesting. 200 games match still has some error margins >>but we'll see a lot from that result. I'm looking forward for the results - not >>too often someone runs a 200+ match here in CCC, thanks! >> >>Severi > > > >On 200 games, the margin of error for 80% reliability is +/-3.5%. >For 70% reliability it's +/-3.0%. > >If a program wins the 200 games match by 53.5% (107-93) or more, you can say >with 80% relability that it is stronger than its opponent. > >If it wins by only 53% (106-94) you can say it is better, but only with 70% >reliability. > >You see that when the programs are very close you need a very large number of >games to determine which is the best. > >On the other hand, if there is a significant difference before you reach 200 >games, it is possible to say which is the best without playing the 200 games. > > > > Christophe The problem is defining "significant". To see why, take the results of a 200 game match, and define a string of 0's and 1's such that 0 means program X lost, and 1 means it won. Then look at the largest number of consecutive 1's or 0's, and the result will be alarming. IE in the match posted, at one point Nimzo was 3 points ahead of Crafty, yet it ended dead even. I wouldn't be surprised to see one program 10 ahead or behind in a 200 game match, and still the match finishes in a dead heat.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.