Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Once again reminder: 10 games is NOTHING in comp-comp play...

Author: Ernest Bonnem

Date: 12:56:24 11/23/98

Go up one level in this thread


On November 23, 1998 at 15:43:24, Ernest Bonnem wrote:

Forgive me, I missed on my previous try :-(

Uri, I agree, but how to translate this in probabilities ???

SSDF Elo confidence intervals (95%) only use the final scores. But then again,
there are multiple opponents...

>On November 23, 1998 at 13:23:34, blass uri wrote:
>
>>
>>On November 23, 1998 at 12:34:49, Amir Ban wrote:
>>
>>>On November 23, 1998 at 11:51:21, Christophe Theron wrote:
>>>
>>>>On November 23, 1998 at 09:42:25, Micheal Cummings wrote:
>>>>
>>>>>
>>>>>On November 23, 1998 at 09:19:42, Jouni Uski wrote:
>>>>>
>>>>>>I started to play Wcrafty 16.1 against Comet A.96. After 10 games
>>>>>>Comet was leadind by 7.5 - 2.5. Something is wrong I thought! But
>>>>>>after 34 games we see real situation.
>>>>>>
>>>>>>Comet   1 1 1 0.5 1 1 0 0 1 1    (7.5)
>>>>>>Wcrafty 0 0 0 0.5 0 0 1 1 0 0    (2.5)
>>>>>>                                                                   total
>>>>>>Comet   1 0 0 0 0.5 0 1 0 1 0.5 0 0 0 1 0 0 0 0 0 1 0 0 0.5 1      15
>>>>>>Wcrafty 0 1 1 1 0.5 1 0 1 0 0.5 1 1 1 0 1 1 1 1 1 0 1 1 0.5 0      19
>>>>>>
>>>>>>So please no conclusions after 10 games - we need about 40.
>>>>>>
>>>>>>Jouni
>>>>>
>>>>>
>>>>>You need more than 40, and that is quite a big swing that crafty made to
>>>>>eventually win over comet. I do not know what to conclude after those sets of
>>>>>games. I would like to see some of the games thouugh too see why there was such
>>>>>a big swing.
>>>>>
>>>>>Not that I do not believe you
>>>>
>>>>
>>>>It is not a big swing. Run computer matches everyday and you will notice this
>>>>all the time.
>>>>
>>>>If you want to check this, flip a coin 30 times, and compute the score of head
>>>>versus tail after every flip. Notice the swings in the 20 first results.
>>>>
>>>>I generally use 60 games matches and consider them to be +/- 2.5% accurate.
>>>>
>>>>That is, even if prog A scores 52.5% against prog B on 60 games, I consider it
>>>>is impossible to say which is the best. I say A is better if it scores above
>>>>52.5%.
>>>>
>>>>For 30 games matches I would take a +/- 5% margin of error.
>>>>
>>>>In the case of the Crafty/Comet match above, the result is 55.9% in favor of
>>>>Crafty on 34 games, so I would conclude that Crafty is better. But you have to
>>>>realize that the confidence on this statement is not high, so if I had to bet I
>>>>would not bet too much.
>>>>
>>>>
>>>>    Christophe
>>>
>>>
>>>Christophe, can I borrow your statistics book ? My book is much more
>>>pessimistic. It tells me that for 60 games, all results narrower than 38-22 are
>>>not 95%-significant (i.e. have a bigger than 5% probability of occurring for
>>>equal strength programs).
>>
>>The assumption that chess is similiar to flipping a coin is not right
>>you assume no draws for saying this.
>>
>>I think that 14 wins and 46 draws and no losses is a significant result when 37
>>wins and 23 losses is not a significant result if the colour is not important.
>>
>>It is more complicated because we must take the colour into the consideration
>>
>>For example 30 wins with white and 7 wins with black and 23 losses with black
>>seems to be a significant result.
>>
>>Uri
>>
>>
>>>
>>>It also tells me that the margin of error does not fall linearly with the number
>>>of games, but quadratically. That is to say, you have to play 4 times as many
>>>games to cut the margin of error in half.
>>>
>>>Amir



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.