Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Spike 1.0 Mainz is too strong for Zappa 1.1 so far 16 to 10

Author: Peter Berger

Date: 08:55:33 08/30/05

Go up one level in this thread


On August 29, 2005 at 18:51:53, Dann Corbit wrote:

>On August 29, 2005 at 18:21:33, Peter Berger wrote:
>
>>On August 29, 2005 at 10:40:54, Kurt Utzinger wrote:
>>
>>>On August 29, 2005 at 06:36:42, Jorge Pichard wrote:
>>>
>>>>   Engine  Score
>>>>1: Spike10 16/26  1=010==1===110=110110=11==
>>>>2: Zappa   10/26  0=101==0===001=001001=00== ···············
>>>
>>>      After only 26 games and a winning score of 61 %
>>>      it's too early for such a statement I think.
>>>      Kurt
>>
>>That's true. But only barely.
>>
>>Assuming that everything is set up properly, games are independent events ( aka
>>no learning) and that white and black have same likeliness to win (just for sake
>>of correctness, I am actually pretty sure this doesn't make a major difference),
>>
>>the result is good enough to claim that Spike is better with 90% confidence. And
>>only one more win in the following game would have been enough for 95%
>>confidence in fact ;) .
>>
>>How do you feel about this one?
>>
>>A 1 1 1 1 1
>>B 0 0 0 0 0
>>
>>More games needed? Not if you can live with 97% confidence .
>
>Of course, if we recall the Cadaques tournament of some years ago, it stated as
>a whitewash for Junior, but Junior eventually lost (possibly due to learning so
>your statement above may apply).

Well, 97% doesn't equal 100% anyway. The Cadaques results were very unlikely
indeed , which probably should at first suggest to have a very close look at the
test conditions. As far as I understand it was a basement event.

But then every week  there are also people who win the lottery - so these things
*do* happen :).

Under valid and controlled conditions it still seems logical to me to stop a
test after a 5-0 result and conclude that the winning program is probably the
stronger one.

>
>>Hmm, let's go back to the imagined 17/27 from Spike. We need more games?
>>
>>OK. Let's look at this result:
>>
>>Wins: 12
>>Loss: 5
>>Draws: 100000
>>
>>Better? Worse? No, the same.
>
>I don't put much credence in any result of less than 30 games.
>After 30 games, then you get a lot more plausibility.

You didn't give any reason for this, so I don't understand. A 6-0 says more
about engine strength than the above match result with over 100000 games.

Peter



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.