Author: Peter Berger
Date: 08:55:33 08/30/05
Go up one level in this thread
On August 29, 2005 at 18:51:53, Dann Corbit wrote: >On August 29, 2005 at 18:21:33, Peter Berger wrote: > >>On August 29, 2005 at 10:40:54, Kurt Utzinger wrote: >> >>>On August 29, 2005 at 06:36:42, Jorge Pichard wrote: >>> >>>> Engine Score >>>>1: Spike10 16/26 1=010==1===110=110110=11== >>>>2: Zappa 10/26 0=101==0===001=001001=00== ··············· >>> >>> After only 26 games and a winning score of 61 % >>> it's too early for such a statement I think. >>> Kurt >> >>That's true. But only barely. >> >>Assuming that everything is set up properly, games are independent events ( aka >>no learning) and that white and black have same likeliness to win (just for sake >>of correctness, I am actually pretty sure this doesn't make a major difference), >> >>the result is good enough to claim that Spike is better with 90% confidence. And >>only one more win in the following game would have been enough for 95% >>confidence in fact ;) . >> >>How do you feel about this one? >> >>A 1 1 1 1 1 >>B 0 0 0 0 0 >> >>More games needed? Not if you can live with 97% confidence . > >Of course, if we recall the Cadaques tournament of some years ago, it stated as >a whitewash for Junior, but Junior eventually lost (possibly due to learning so >your statement above may apply). Well, 97% doesn't equal 100% anyway. The Cadaques results were very unlikely indeed , which probably should at first suggest to have a very close look at the test conditions. As far as I understand it was a basement event. But then every week there are also people who win the lottery - so these things *do* happen :). Under valid and controlled conditions it still seems logical to me to stop a test after a 5-0 result and conclude that the winning program is probably the stronger one. > >>Hmm, let's go back to the imagined 17/27 from Spike. We need more games? >> >>OK. Let's look at this result: >> >>Wins: 12 >>Loss: 5 >>Draws: 100000 >> >>Better? Worse? No, the same. > >I don't put much credence in any result of less than 30 games. >After 30 games, then you get a lot more plausibility. You didn't give any reason for this, so I don't understand. A 6-0 says more about engine strength than the above match result with over 100000 games. Peter
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.