Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Standard deviations -- how many games?

Author: Bob Durrett

Date: 18:56:03 01/23/04

Go up one level in this thread


On January 23, 2004 at 21:51:10, Dann Corbit wrote:

>On January 23, 2004 at 21:39:15, Bob Durrett wrote:
>
>>On January 23, 2004 at 21:27:43, Dann Corbit wrote:
>>
>>>On January 23, 2004 at 21:08:27, Bob Durrett wrote:
>>>
>>>>On January 23, 2004 at 15:24:31, Dann Corbit wrote:
>>>>
>>>>>30 experiments is a fairly standard rule as to when you should start to trust
>>>>>the results for experimental data.
>>>>>
>>>>>From:
>>>>>http://www.twoplustwo.com/mmessay8.html
>>>>>"A good rule of thumb is to have at least 30 observations (playing sessions) for
>>>>>the estimate to be reasonably accurate. However, the more the better, unless for
>>>>>some reason you think the game for which you are trying to estimate your
>>>>>standard deviation has changed significantly over some particular period of
>>>>>time."
>>>>>
>>>>>From:
>>>>>http://www.odu.edu/sci/xu/chapter3.pdf
>>>>>"C. The Reliability of s as a Measure of Precision - the more measurements that
>>>>>are made, the more reliable the value obtained for s. Usually 20 - 30
>>>>>measurements are necessary."
>>>>>
>>>>>From
>>>>>http://www.stat.psu.edu/~resources/ClassNotes/ljs_21/ljs_21.PPT#11
>>>>>Concerning the central limit theorem, we have this:
>>>>>Even if data are not normally distributed, as long as you take “large enough”
>>>>>samples, the sample averages will at least be approximately normally
>>>>>distributed.
>>>>>Mean of sample averages is still mu
>>>>>Standard error of sample averages is still sigma/sqrt(n).
>>>>>In general, “large enough” means more than 30 measurements.
>>>>>
>>>>>
>>>>>Of course, the more the merrier, when it comes to measurements.
>>>>
>>>>I don't wish to muddy the waters too much but the fact is that chess-playing
>>>>programs or machines do not enter tournaments with zero information known about
>>>>them.  Just as in human tournaments, prior knowledge known prior to any games
>>>>being played in the tournament can be very significant.
>>>>
>>>>Consider a trivial example:  Suppose a top GM is to play a chess match against a
>>>>true chess beginner.  It is known apriori that the top GM is a whiz at chess and
>>>>the beginner is a washout.
>>>>
>>>>Will it take thirty games to determine who is better?  No, it will take ZERO
>>>>games.
>>>
>>>You are wrong.  It will take 30 games before we know anything about the unknown
>>>player.  Consider this:
>>>At one time, Kasparov, Fischer, and Tal had an Elo below 2000 and were
>>>completely unknown.  They came out of the woodwork and started blasting the
>>>bejabbers out of people.  Just because we know someone is talented, does not
>>>mean we can use that data to extrapolate the level of talent of an unknown
>>>entitiy.
>>>
>>>>The number of games required depends on the prior knowledge about the
>>>>contestants.
>>>
>>>There is no connection at all.  However, we will gather more and more
>>>information about the strength of the unknown opponent as more games are played.
>>> He could be weaker, stronger, or the same as the great player.  Imagine someone
>>>who does not play humans but has played against computers for 5 years.  He might
>>>be a very good player that nobody has heard of.  Of course, it is not likely
>>>that a player will be better than Kasparov or Anand.  But until the games are
>>>played, we won't know.  And 3 games against Kasparov will tell us very little.
>>>Even if Kasparov loses all 3 games.
>>>
>>>>I hope this is not too distressful for anybody.  : )
>>>
>>>Bad science.  Using your intuition to do science is a very bad idea.  It is good
>>>to form theories using intuition.  But it is bad to assert the truth of your
>>>feelings without testing.
>>
>>Well, I never claimed to be a scientist!  : )
>>
>>The fact that people change, and so do chess-playing programs, complicates the
>>calculations.  As you noted in another bulletin, changing the software's
>>variation number or changing the hardware can make a significant difference.
>>
>>I am not sure about the quality of published ratings of chess-playing programs,
>>but the published FIDE ratings for humans are based on prior games.  In a sense,
>>one could consider the current tournament as being an addendum or extension to
>>the total global "tournament."  [I hope you know what I mean.]  When results of
>>a tournament are used to calculate new ratings , the results of the games played
>>in the past have a LOT to do with the computed new ratings for the experienced
>>players.  I'm sure you agree.
>>
>>I don't think this is using an excessive amount of intuition.  : )
>
>I am in agreement with this.  It is one area where the computer tests are
>clearly superior.  That is because the computer hardware and the computer
>program are not allowed to change during the experiment.  There are some
>computer programs that do learn, and I don't know if it can really be completely
>disabled.  So in that sense, the experiments can have a moving target.
>
>This is the toughest part of the analysis, I think.  Another problem is that if
>we disable the learning, then the computer program will not behave the way it
>does when you actually use it.  So we may have a more repeatable result, but it
>is not a result that would ever be used in practice.

Life is imperfect.  : )

Bob D.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.