Author: Dann Corbit
Date: 10:41:18 08/30/05
Go up one level in this thread
On August 30, 2005 at 11:55:33, Peter Berger wrote: >On August 29, 2005 at 18:51:53, Dann Corbit wrote: > >>On August 29, 2005 at 18:21:33, Peter Berger wrote: >> >>>On August 29, 2005 at 10:40:54, Kurt Utzinger wrote: >>> >>>>On August 29, 2005 at 06:36:42, Jorge Pichard wrote: >>>> >>>>> Engine Score >>>>>1: Spike10 16/26 1=010==1===110=110110=11== >>>>>2: Zappa 10/26 0=101==0===001=001001=00== ··············· >>>> >>>> After only 26 games and a winning score of 61 % >>>> it's too early for such a statement I think. >>>> Kurt >>> >>>That's true. But only barely. >>> >>>Assuming that everything is set up properly, games are independent events ( aka >>>no learning) and that white and black have same likeliness to win (just for sake >>>of correctness, I am actually pretty sure this doesn't make a major difference), >>> >>>the result is good enough to claim that Spike is better with 90% confidence. And >>>only one more win in the following game would have been enough for 95% >>>confidence in fact ;) . >>> >>>How do you feel about this one? >>> >>>A 1 1 1 1 1 >>>B 0 0 0 0 0 >>> >>>More games needed? Not if you can live with 97% confidence . >> >>Of course, if we recall the Cadaques tournament of some years ago, it stated as >>a whitewash for Junior, but Junior eventually lost (possibly due to learning so >>your statement above may apply). > >Well, 97% doesn't equal 100% anyway. The Cadaques results were very unlikely >indeed , which probably should at first suggest to have a very close look at the >test conditions. As far as I understand it was a basement event. > >But then every week there are also people who win the lottery - so these things >*do* happen :). > >Under valid and controlled conditions it still seems logical to me to stop a >test after a 5-0 result and conclude that the winning program is probably the >stronger one. > >> >>>Hmm, let's go back to the imagined 17/27 from Spike. We need more games? >>> >>>OK. Let's look at this result: >>> >>>Wins: 12 >>>Loss: 5 >>>Draws: 100000 >>> >>>Better? Worse? No, the same. >> >>I don't put much credence in any result of less than 30 games. >>After 30 games, then you get a lot more plausibility. > >You didn't give any reason for this, so I don't understand. A 6-0 says more >about engine strength than the above match result with over 100000 games. If I play those same to engines 100000 times, I am very, very sure I will know which one is stronger with far more confidence than after just 6 games. I will also trust the 30 game result much more than the 6 game result. Google queries like this one: http://www.google.com/search?hl=en&lr=&q=%28%2230+trials%22+OR+%22thirty+trials%22+OR+%2230+measurements%22+OR+%22thirty+measurements%22%29+statistics will show that 30 or more trials has very good statistical properties. For example: "A larger issue lurks here: how many trials counts as "many" trials? There are several rules of thumb. One rule holds that above 30 trials, the data set will conform well to the ideal chance distribution (the normal distribution, which we will examine later)."
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.