Author: Amir Ban
Date: 13:36:57 01/28/00
Go up one level in this thread
On January 28, 2000 at 04:07:44, Christophe Theron wrote: <snip> >I have just run it. My sample is 1000 matches. Each match is made of 200 games. >My program tells me that with 200 games I can only be sure that one program is >stronger if the elo difference of the two is above 35 elo points, and this is >sure with a 93.5% confidence. > >If the programs are closer than 35 elo points, 200 games are not enough to be >sure which is best. > >Number of matches: 1000 >Number of games in each match: 200 >Compute probability of error greater than: 5 > > > > Christophe > > Something wrong with the numbers here: 200x1000 games are good enough to establish a rating with 95% confidence margib of 1.5 points. If two programs are 35 points apart, you would need only about 400 games to say tell with 95% confidence which is better. This also fails to say something important: The greater the difference in strength, the less games needed to prove who is better. If players are 100 points apart, only about 50 games are needed. A 200 point difference would show up almost immediately. I think there's also a logical trap than even the smartest fall into. When people see for example the SSDF list, and see their 95% confidence intervals, they jump to the conclusion that if the point spread is within this interval, it has NO significance, which is not true. I can very well make statements based on only 80% (gasp!) probablility. I expect to be right 80% of the time, and in most cases I will pass for a very smart person. Amir
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.