Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Standard deviations -- how many games?

Author: Dann Corbit

Date: 18:51:10 01/23/04

Go up one level in this thread


On January 23, 2004 at 21:39:15, Bob Durrett wrote:

>On January 23, 2004 at 21:27:43, Dann Corbit wrote:
>
>>On January 23, 2004 at 21:08:27, Bob Durrett wrote:
>>
>>>On January 23, 2004 at 15:24:31, Dann Corbit wrote:
>>>
>>>>30 experiments is a fairly standard rule as to when you should start to trust
>>>>the results for experimental data.
>>>>
>>>>From:
>>>>http://www.twoplustwo.com/mmessay8.html
>>>>"A good rule of thumb is to have at least 30 observations (playing sessions) for
>>>>the estimate to be reasonably accurate. However, the more the better, unless for
>>>>some reason you think the game for which you are trying to estimate your
>>>>standard deviation has changed significantly over some particular period of
>>>>time."
>>>>
>>>>From:
>>>>http://www.odu.edu/sci/xu/chapter3.pdf
>>>>"C. The Reliability of s as a Measure of Precision - the more measurements that
>>>>are made, the more reliable the value obtained for s. Usually 20 - 30
>>>>measurements are necessary."
>>>>
>>>>From
>>>>http://www.stat.psu.edu/~resources/ClassNotes/ljs_21/ljs_21.PPT#11
>>>>Concerning the central limit theorem, we have this:
>>>>Even if data are not normally distributed, as long as you take “large enough”
>>>>samples, the sample averages will at least be approximately normally
>>>>distributed.
>>>>Mean of sample averages is still mu
>>>>Standard error of sample averages is still sigma/sqrt(n).
>>>>In general, “large enough” means more than 30 measurements.
>>>>
>>>>
>>>>Of course, the more the merrier, when it comes to measurements.
>>>
>>>I don't wish to muddy the waters too much but the fact is that chess-playing
>>>programs or machines do not enter tournaments with zero information known about
>>>them.  Just as in human tournaments, prior knowledge known prior to any games
>>>being played in the tournament can be very significant.
>>>
>>>Consider a trivial example:  Suppose a top GM is to play a chess match against a
>>>true chess beginner.  It is known apriori that the top GM is a whiz at chess and
>>>the beginner is a washout.
>>>
>>>Will it take thirty games to determine who is better?  No, it will take ZERO
>>>games.
>>
>>You are wrong.  It will take 30 games before we know anything about the unknown
>>player.  Consider this:
>>At one time, Kasparov, Fischer, and Tal had an Elo below 2000 and were
>>completely unknown.  They came out of the woodwork and started blasting the
>>bejabbers out of people.  Just because we know someone is talented, does not
>>mean we can use that data to extrapolate the level of talent of an unknown
>>entitiy.
>>
>>>The number of games required depends on the prior knowledge about the
>>>contestants.
>>
>>There is no connection at all.  However, we will gather more and more
>>information about the strength of the unknown opponent as more games are played.
>> He could be weaker, stronger, or the same as the great player.  Imagine someone
>>who does not play humans but has played against computers for 5 years.  He might
>>be a very good player that nobody has heard of.  Of course, it is not likely
>>that a player will be better than Kasparov or Anand.  But until the games are
>>played, we won't know.  And 3 games against Kasparov will tell us very little.
>>Even if Kasparov loses all 3 games.
>>
>>>I hope this is not too distressful for anybody.  : )
>>
>>Bad science.  Using your intuition to do science is a very bad idea.  It is good
>>to form theories using intuition.  But it is bad to assert the truth of your
>>feelings without testing.
>
>Well, I never claimed to be a scientist!  : )
>
>The fact that people change, and so do chess-playing programs, complicates the
>calculations.  As you noted in another bulletin, changing the software's
>variation number or changing the hardware can make a significant difference.
>
>I am not sure about the quality of published ratings of chess-playing programs,
>but the published FIDE ratings for humans are based on prior games.  In a sense,
>one could consider the current tournament as being an addendum or extension to
>the total global "tournament."  [I hope you know what I mean.]  When results of
>a tournament are used to calculate new ratings , the results of the games played
>in the past have a LOT to do with the computed new ratings for the experienced
>players.  I'm sure you agree.
>
>I don't think this is using an excessive amount of intuition.  : )

I am in agreement with this.  It is one area where the computer tests are
clearly superior.  That is because the computer hardware and the computer
program are not allowed to change during the experiment.  There are some
computer programs that do learn, and I don't know if it can really be completely
disabled.  So in that sense, the experiments can have a moving target.

This is the toughest part of the analysis, I think.  Another problem is that if
we disable the learning, then the computer program will not behave the way it
does when you actually use it.  So we may have a more repeatable result, but it
is not a result that would ever be used in practice.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.