Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Once again reminder: 10 games is NOTHING in comp-comp play...

Author: Amir Ban

Date: 09:34:49 11/23/98

Go up one level in this thread


On November 23, 1998 at 11:51:21, Christophe Theron wrote:

>On November 23, 1998 at 09:42:25, Micheal Cummings wrote:
>
>>
>>On November 23, 1998 at 09:19:42, Jouni Uski wrote:
>>
>>>I started to play Wcrafty 16.1 against Comet A.96. After 10 games
>>>Comet was leadind by 7.5 - 2.5. Something is wrong I thought! But
>>>after 34 games we see real situation.
>>>
>>>Comet   1 1 1 0.5 1 1 0 0 1 1    (7.5)
>>>Wcrafty 0 0 0 0.5 0 0 1 1 0 0    (2.5)
>>>                                                                   total
>>>Comet   1 0 0 0 0.5 0 1 0 1 0.5 0 0 0 1 0 0 0 0 0 1 0 0 0.5 1      15
>>>Wcrafty 0 1 1 1 0.5 1 0 1 0 0.5 1 1 1 0 1 1 1 1 1 0 1 1 0.5 0      19
>>>
>>>So please no conclusions after 10 games - we need about 40.
>>>
>>>Jouni
>>
>>
>>You need more than 40, and that is quite a big swing that crafty made to
>>eventually win over comet. I do not know what to conclude after those sets of
>>games. I would like to see some of the games thouugh too see why there was such
>>a big swing.
>>
>>Not that I do not believe you
>
>
>It is not a big swing. Run computer matches everyday and you will notice this
>all the time.
>
>If you want to check this, flip a coin 30 times, and compute the score of head
>versus tail after every flip. Notice the swings in the 20 first results.
>
>I generally use 60 games matches and consider them to be +/- 2.5% accurate.
>
>That is, even if prog A scores 52.5% against prog B on 60 games, I consider it
>is impossible to say which is the best. I say A is better if it scores above
>52.5%.
>
>For 30 games matches I would take a +/- 5% margin of error.
>
>In the case of the Crafty/Comet match above, the result is 55.9% in favor of
>Crafty on 34 games, so I would conclude that Crafty is better. But you have to
>realize that the confidence on this statement is not high, so if I had to bet I
>would not bet too much.
>
>
>    Christophe


Christophe, can I borrow your statistics book ? My book is much more
pessimistic. It tells me that for 60 games, all results narrower than 38-22 are
not 95%-significant (i.e. have a bigger than 5% probability of occurring for
equal strength programs).

It also tells me that the margin of error does not fall linearly with the number
of games, but quadratically. That is to say, you have to play 4 times as many
games to cut the margin of error in half.

Amir




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.