Author: Amir Ban
Date: 09:34:49 11/23/98
Go up one level in this thread
On November 23, 1998 at 11:51:21, Christophe Theron wrote: >On November 23, 1998 at 09:42:25, Micheal Cummings wrote: > >> >>On November 23, 1998 at 09:19:42, Jouni Uski wrote: >> >>>I started to play Wcrafty 16.1 against Comet A.96. After 10 games >>>Comet was leadind by 7.5 - 2.5. Something is wrong I thought! But >>>after 34 games we see real situation. >>> >>>Comet 1 1 1 0.5 1 1 0 0 1 1 (7.5) >>>Wcrafty 0 0 0 0.5 0 0 1 1 0 0 (2.5) >>> total >>>Comet 1 0 0 0 0.5 0 1 0 1 0.5 0 0 0 1 0 0 0 0 0 1 0 0 0.5 1 15 >>>Wcrafty 0 1 1 1 0.5 1 0 1 0 0.5 1 1 1 0 1 1 1 1 1 0 1 1 0.5 0 19 >>> >>>So please no conclusions after 10 games - we need about 40. >>> >>>Jouni >> >> >>You need more than 40, and that is quite a big swing that crafty made to >>eventually win over comet. I do not know what to conclude after those sets of >>games. I would like to see some of the games thouugh too see why there was such >>a big swing. >> >>Not that I do not believe you > > >It is not a big swing. Run computer matches everyday and you will notice this >all the time. > >If you want to check this, flip a coin 30 times, and compute the score of head >versus tail after every flip. Notice the swings in the 20 first results. > >I generally use 60 games matches and consider them to be +/- 2.5% accurate. > >That is, even if prog A scores 52.5% against prog B on 60 games, I consider it >is impossible to say which is the best. I say A is better if it scores above >52.5%. > >For 30 games matches I would take a +/- 5% margin of error. > >In the case of the Crafty/Comet match above, the result is 55.9% in favor of >Crafty on 34 games, so I would conclude that Crafty is better. But you have to >realize that the confidence on this statement is not high, so if I had to bet I >would not bet too much. > > > Christophe Christophe, can I borrow your statistics book ? My book is much more pessimistic. It tells me that for 60 games, all results narrower than 38-22 are not 95%-significant (i.e. have a bigger than 5% probability of occurring for equal strength programs). It also tells me that the margin of error does not fall linearly with the number of games, but quadratically. That is to say, you have to play 4 times as many games to cut the margin of error in half. Amir
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.