Author: Dann Corbit
Date: 17:38:01 01/23/04
Go up one level in this thread
On January 23, 2004 at 20:00:30, Rolf Tueschen wrote: >On January 23, 2004 at 18:33:52, Dann Corbit wrote: > >>On January 23, 2004 at 18:20:34, Russell Reagan wrote: >> >>>On January 23, 2004 at 15:24:31, Dann Corbit wrote: >>> >>>>30 experiments is a fairly standard rule as to when you should start to trust >>>>the results for experimental data. >>> >>>So what does this mean for chess engine matches? You need at least 30 games? Or >>>30 matches? If matches, how do you determine how long each match should be? >> >>It means less than 30 games and you cannot trust the answer. >>With more than 30 games, confidence rises. >> >>I bring up the number 30 because it is important in this case. If you run (for >>instance) a 15 game contest, it would be dangerous to try to draw conclusions >>from it. With 30 games or more, even something that does not perfectly model a >>normal distribution will start to conform to the right answers (e.g. the mean >>calculation will be about right. The standard deviations will be about right >>unless sharply skewed). >> >>30 games is the break even limit where deficiencies in the choice of a normal >>distribution as a model start to become smoothed over. > > >About what measurements you are talking here? Of course the N is right for >normally distributed variables but what do you "measure" with chess games? +1, -1, 0 >Second question: when you have almost equally strong chess programs you are >implying that after 30 games you can make a sound conclusion which one is >stronger? No. After 30 games you can start to believe the measurements. >- If you think you can answer with YES, then I doubt it. So do I. > Of course, if >you then do - what the SSDF is doing - matches between two unequal progs you can >well get clear results after 5 games. But only by accident. >But of course 30 games will be a good >profit for the better program. It will be a good start. >Also if tests have been known where a directly >concurring prog usually gets less points against this out-dated prog... > >What I want to say is this. It was often explained here in CCC. For good results >you must enter into some thousand games mode. 30 games is just for laughter. It >is an irrelevant species. 30 games is the bare minimum number to the point where the number may have a tiny scrap of validity. If you use less than that, you can be reporting pure nonesense. This is especially the case because we know that it is not exactly a gaussian distribution.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.