Author: Dann Corbit
Date: 17:40:21 01/23/04
Go up one level in this thread
On January 23, 2004 at 20:38:01, Dann Corbit wrote: >On January 23, 2004 at 20:00:30, Rolf Tueschen wrote: > >>On January 23, 2004 at 18:33:52, Dann Corbit wrote: >> >>>On January 23, 2004 at 18:20:34, Russell Reagan wrote: >>> >>>>On January 23, 2004 at 15:24:31, Dann Corbit wrote: >>>> >>>>>30 experiments is a fairly standard rule as to when you should start to trust >>>>>the results for experimental data. >>>> >>>>So what does this mean for chess engine matches? You need at least 30 games? Or >>>>30 matches? If matches, how do you determine how long each match should be? >>> >>>It means less than 30 games and you cannot trust the answer. >>>With more than 30 games, confidence rises. >>> >>>I bring up the number 30 because it is important in this case. If you run (for >>>instance) a 15 game contest, it would be dangerous to try to draw conclusions >>>from it. With 30 games or more, even something that does not perfectly model a >>>normal distribution will start to conform to the right answers (e.g. the mean >>>calculation will be about right. The standard deviations will be about right >>>unless sharply skewed). >>> >>>30 games is the break even limit where deficiencies in the choice of a normal >>>distribution as a model start to become smoothed over. >> >> >>About what measurements you are talking here? Of course the N is right for >>normally distributed variables but what do you "measure" with chess games? > >+1, -1, 0 I suppose this is an odd statement. Perhaps many will think I am off my rocker. I imagine that I meant to say 1-0, 0-1, 1/2-1/2 so that it can be familiar. But we have a three state outcome, at any rate. > >>Second question: when you have almost equally strong chess programs you are >>implying that after 30 games you can make a sound conclusion which one is >>stronger? > >No. After 30 games you can start to believe the measurements. > >>- If you think you can answer with YES, then I doubt it. > >So do I. > >> Of course, if >>you then do - what the SSDF is doing - matches between two unequal progs you can >>well get clear results after 5 games. > >But only by accident. > >>But of course 30 games will be a good >>profit for the better program. > >It will be a good start. > >>Also if tests have been known where a directly >>concurring prog usually gets less points against this out-dated prog... >> >>What I want to say is this. It was often explained here in CCC. For good results >>you must enter into some thousand games mode. 30 games is just for laughter. It >>is an irrelevant species. > >30 games is the bare minimum number to the point where the number may have a >tiny scrap of validity. If you use less than that, you can be reporting pure >nonesense. > >This is especially the case because we know that it is not exactly a gaussian >distribution.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.