Author: Bob Durrett
Date: 18:56:03 01/23/04
Go up one level in this thread
On January 23, 2004 at 21:51:10, Dann Corbit wrote: >On January 23, 2004 at 21:39:15, Bob Durrett wrote: > >>On January 23, 2004 at 21:27:43, Dann Corbit wrote: >> >>>On January 23, 2004 at 21:08:27, Bob Durrett wrote: >>> >>>>On January 23, 2004 at 15:24:31, Dann Corbit wrote: >>>> >>>>>30 experiments is a fairly standard rule as to when you should start to trust >>>>>the results for experimental data. >>>>> >>>>>From: >>>>>http://www.twoplustwo.com/mmessay8.html >>>>>"A good rule of thumb is to have at least 30 observations (playing sessions) for >>>>>the estimate to be reasonably accurate. However, the more the better, unless for >>>>>some reason you think the game for which you are trying to estimate your >>>>>standard deviation has changed significantly over some particular period of >>>>>time." >>>>> >>>>>From: >>>>>http://www.odu.edu/sci/xu/chapter3.pdf >>>>>"C. The Reliability of s as a Measure of Precision - the more measurements that >>>>>are made, the more reliable the value obtained for s. Usually 20 - 30 >>>>>measurements are necessary." >>>>> >>>>>From >>>>>http://www.stat.psu.edu/~resources/ClassNotes/ljs_21/ljs_21.PPT#11 >>>>>Concerning the central limit theorem, we have this: >>>>>Even if data are not normally distributed, as long as you take “large enough” >>>>>samples, the sample averages will at least be approximately normally >>>>>distributed. >>>>>Mean of sample averages is still mu >>>>>Standard error of sample averages is still sigma/sqrt(n). >>>>>In general, “large enough” means more than 30 measurements. >>>>> >>>>> >>>>>Of course, the more the merrier, when it comes to measurements. >>>> >>>>I don't wish to muddy the waters too much but the fact is that chess-playing >>>>programs or machines do not enter tournaments with zero information known about >>>>them. Just as in human tournaments, prior knowledge known prior to any games >>>>being played in the tournament can be very significant. >>>> >>>>Consider a trivial example: Suppose a top GM is to play a chess match against a >>>>true chess beginner. It is known apriori that the top GM is a whiz at chess and >>>>the beginner is a washout. >>>> >>>>Will it take thirty games to determine who is better? No, it will take ZERO >>>>games. >>> >>>You are wrong. It will take 30 games before we know anything about the unknown >>>player. Consider this: >>>At one time, Kasparov, Fischer, and Tal had an Elo below 2000 and were >>>completely unknown. They came out of the woodwork and started blasting the >>>bejabbers out of people. Just because we know someone is talented, does not >>>mean we can use that data to extrapolate the level of talent of an unknown >>>entitiy. >>> >>>>The number of games required depends on the prior knowledge about the >>>>contestants. >>> >>>There is no connection at all. However, we will gather more and more >>>information about the strength of the unknown opponent as more games are played. >>> He could be weaker, stronger, or the same as the great player. Imagine someone >>>who does not play humans but has played against computers for 5 years. He might >>>be a very good player that nobody has heard of. Of course, it is not likely >>>that a player will be better than Kasparov or Anand. But until the games are >>>played, we won't know. And 3 games against Kasparov will tell us very little. >>>Even if Kasparov loses all 3 games. >>> >>>>I hope this is not too distressful for anybody. : ) >>> >>>Bad science. Using your intuition to do science is a very bad idea. It is good >>>to form theories using intuition. But it is bad to assert the truth of your >>>feelings without testing. >> >>Well, I never claimed to be a scientist! : ) >> >>The fact that people change, and so do chess-playing programs, complicates the >>calculations. As you noted in another bulletin, changing the software's >>variation number or changing the hardware can make a significant difference. >> >>I am not sure about the quality of published ratings of chess-playing programs, >>but the published FIDE ratings for humans are based on prior games. In a sense, >>one could consider the current tournament as being an addendum or extension to >>the total global "tournament." [I hope you know what I mean.] When results of >>a tournament are used to calculate new ratings , the results of the games played >>in the past have a LOT to do with the computed new ratings for the experienced >>players. I'm sure you agree. >> >>I don't think this is using an excessive amount of intuition. : ) > >I am in agreement with this. It is one area where the computer tests are >clearly superior. That is because the computer hardware and the computer >program are not allowed to change during the experiment. There are some >computer programs that do learn, and I don't know if it can really be completely >disabled. So in that sense, the experiments can have a moving target. > >This is the toughest part of the analysis, I think. Another problem is that if >we disable the learning, then the computer program will not behave the way it >does when you actually use it. So we may have a more repeatable result, but it >is not a result that would ever be used in practice. Life is imperfect. : ) Bob D.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.