Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Standard deviations -- how many games?

Author: Dann Corbit

Date: 17:38:01 01/23/04

Go up one level in this thread


On January 23, 2004 at 20:00:30, Rolf Tueschen wrote:

>On January 23, 2004 at 18:33:52, Dann Corbit wrote:
>
>>On January 23, 2004 at 18:20:34, Russell Reagan wrote:
>>
>>>On January 23, 2004 at 15:24:31, Dann Corbit wrote:
>>>
>>>>30 experiments is a fairly standard rule as to when you should start to trust
>>>>the results for experimental data.
>>>
>>>So what does this mean for chess engine matches? You need at least 30 games? Or
>>>30 matches? If matches, how do you determine how long each match should be?
>>
>>It means less than 30 games and you cannot trust the answer.
>>With more than 30 games, confidence rises.
>>
>>I bring up the number 30 because it is important in this case.  If you run (for
>>instance) a 15 game contest, it would be dangerous to try to draw conclusions
>>from it.  With 30 games or more, even something that does not perfectly model a
>>normal distribution will start to conform to the right answers (e.g. the mean
>>calculation will be about right.  The standard deviations will be about right
>>unless sharply skewed).
>>
>>30 games is the break even limit where deficiencies in the choice of a normal
>>distribution as a model start to become smoothed over.
>
>
>About what measurements you are talking here? Of course the N is right for
>normally distributed variables but what do you "measure" with chess games?

+1, -1, 0

>Second question: when you have almost equally strong chess programs you are
>implying that after 30 games you can make a sound conclusion which one is
>stronger?

No.  After 30 games you can start to believe the measurements.

>- If you think you can answer with YES, then I doubt it.

So do I.

> Of course, if
>you then do - what the SSDF is doing - matches between two unequal progs you can
>well get clear results after 5 games.

But only by accident.

>But of course 30 games will be a good
>profit for the better program.

It will be a good start.

>Also if tests have been known where a directly
>concurring prog usually gets less points against this out-dated prog...
>
>What I want to say is this. It was often explained here in CCC. For good results
>you must enter into some thousand games mode. 30 games is just for laughter. It
>is an irrelevant species.

30 games is the bare minimum number to the point where the number may have a
tiny scrap of validity.  If you use less than that, you can be reporting pure
nonesense.

This is especially the case because we know that it is not exactly a gaussian
distribution.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.