Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Standard deviations -- how many games?

Author: Bob Durrett

Date: 18:39:15 01/23/04

Go up one level in this thread


On January 23, 2004 at 21:27:43, Dann Corbit wrote:

>On January 23, 2004 at 21:08:27, Bob Durrett wrote:
>
>>On January 23, 2004 at 15:24:31, Dann Corbit wrote:
>>
>>>30 experiments is a fairly standard rule as to when you should start to trust
>>>the results for experimental data.
>>>
>>>From:
>>>http://www.twoplustwo.com/mmessay8.html
>>>"A good rule of thumb is to have at least 30 observations (playing sessions) for
>>>the estimate to be reasonably accurate. However, the more the better, unless for
>>>some reason you think the game for which you are trying to estimate your
>>>standard deviation has changed significantly over some particular period of
>>>time."
>>>
>>>From:
>>>http://www.odu.edu/sci/xu/chapter3.pdf
>>>"C. The Reliability of s as a Measure of Precision - the more measurements that
>>>are made, the more reliable the value obtained for s. Usually 20 - 30
>>>measurements are necessary."
>>>
>>>From
>>>http://www.stat.psu.edu/~resources/ClassNotes/ljs_21/ljs_21.PPT#11
>>>Concerning the central limit theorem, we have this:
>>>Even if data are not normally distributed, as long as you take “large enough”
>>>samples, the sample averages will at least be approximately normally
>>>distributed.
>>>Mean of sample averages is still mu
>>>Standard error of sample averages is still sigma/sqrt(n).
>>>In general, “large enough” means more than 30 measurements.
>>>
>>>
>>>Of course, the more the merrier, when it comes to measurements.
>>
>>I don't wish to muddy the waters too much but the fact is that chess-playing
>>programs or machines do not enter tournaments with zero information known about
>>them.  Just as in human tournaments, prior knowledge known prior to any games
>>being played in the tournament can be very significant.
>>
>>Consider a trivial example:  Suppose a top GM is to play a chess match against a
>>true chess beginner.  It is known apriori that the top GM is a whiz at chess and
>>the beginner is a washout.
>>
>>Will it take thirty games to determine who is better?  No, it will take ZERO
>>games.
>
>You are wrong.  It will take 30 games before we know anything about the unknown
>player.  Consider this:
>At one time, Kasparov, Fischer, and Tal had an Elo below 2000 and were
>completely unknown.  They came out of the woodwork and started blasting the
>bejabbers out of people.  Just because we know someone is talented, does not
>mean we can use that data to extrapolate the level of talent of an unknown
>entitiy.
>
>>The number of games required depends on the prior knowledge about the
>>contestants.
>
>There is no connection at all.  However, we will gather more and more
>information about the strength of the unknown opponent as more games are played.
> He could be weaker, stronger, or the same as the great player.  Imagine someone
>who does not play humans but has played against computers for 5 years.  He might
>be a very good player that nobody has heard of.  Of course, it is not likely
>that a player will be better than Kasparov or Anand.  But until the games are
>played, we won't know.  And 3 games against Kasparov will tell us very little.
>Even if Kasparov loses all 3 games.
>
>>I hope this is not too distressful for anybody.  : )
>
>Bad science.  Using your intuition to do science is a very bad idea.  It is good
>to form theories using intuition.  But it is bad to assert the truth of your
>feelings without testing.

Well, I never claimed to be a scientist!  : )

The fact that people change, and so do chess-playing programs, complicates the
calculations.  As you noted in another bulletin, changing the software's
variation number or changing the hardware can make a significant difference.

I am not sure about the quality of published ratings of chess-playing programs,
but the published FIDE ratings for humans are based on prior games.  In a sense,
one could consider the current tournament as being an addendum or extension to
the total global "tournament."  [I hope you know what I mean.]  When results of
a tournament are used to calculate new ratings , the results of the games played
in the past have a LOT to do with the computed new ratings for the experienced
players.  I'm sure you agree.

I don't think this is using an excessive amount of intuition.  : )

Bob D.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.