Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Standard deviations -- how many games?

Author: Dann Corbit

Date: 17:40:21 01/23/04

Go up one level in this thread


On January 23, 2004 at 20:38:01, Dann Corbit wrote:

>On January 23, 2004 at 20:00:30, Rolf Tueschen wrote:
>
>>On January 23, 2004 at 18:33:52, Dann Corbit wrote:
>>
>>>On January 23, 2004 at 18:20:34, Russell Reagan wrote:
>>>
>>>>On January 23, 2004 at 15:24:31, Dann Corbit wrote:
>>>>
>>>>>30 experiments is a fairly standard rule as to when you should start to trust
>>>>>the results for experimental data.
>>>>
>>>>So what does this mean for chess engine matches? You need at least 30 games? Or
>>>>30 matches? If matches, how do you determine how long each match should be?
>>>
>>>It means less than 30 games and you cannot trust the answer.
>>>With more than 30 games, confidence rises.
>>>
>>>I bring up the number 30 because it is important in this case.  If you run (for
>>>instance) a 15 game contest, it would be dangerous to try to draw conclusions
>>>from it.  With 30 games or more, even something that does not perfectly model a
>>>normal distribution will start to conform to the right answers (e.g. the mean
>>>calculation will be about right.  The standard deviations will be about right
>>>unless sharply skewed).
>>>
>>>30 games is the break even limit where deficiencies in the choice of a normal
>>>distribution as a model start to become smoothed over.
>>
>>
>>About what measurements you are talking here? Of course the N is right for
>>normally distributed variables but what do you "measure" with chess games?
>
>+1, -1, 0

I suppose this is an odd statement.  Perhaps many will think I am off my rocker.
 I imagine that I meant to say 1-0, 0-1, 1/2-1/2 so that it can be familiar.

But we have a three state outcome, at any rate.

>
>>Second question: when you have almost equally strong chess programs you are
>>implying that after 30 games you can make a sound conclusion which one is
>>stronger?
>
>No.  After 30 games you can start to believe the measurements.
>
>>- If you think you can answer with YES, then I doubt it.
>
>So do I.
>
>> Of course, if
>>you then do - what the SSDF is doing - matches between two unequal progs you can
>>well get clear results after 5 games.
>
>But only by accident.
>
>>But of course 30 games will be a good
>>profit for the better program.
>
>It will be a good start.
>
>>Also if tests have been known where a directly
>>concurring prog usually gets less points against this out-dated prog...
>>
>>What I want to say is this. It was often explained here in CCC. For good results
>>you must enter into some thousand games mode. 30 games is just for laughter. It
>>is an irrelevant species.
>
>30 games is the bare minimum number to the point where the number may have a
>tiny scrap of validity.  If you use less than that, you can be reporting pure
>nonesense.
>
>This is especially the case because we know that it is not exactly a gaussian
>distribution.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.