Author: blass uri
Date: 10:23:34 11/23/98
Go up one level in this thread
On November 23, 1998 at 12:34:49, Amir Ban wrote: >On November 23, 1998 at 11:51:21, Christophe Theron wrote: > >>On November 23, 1998 at 09:42:25, Micheal Cummings wrote: >> >>> >>>On November 23, 1998 at 09:19:42, Jouni Uski wrote: >>> >>>>I started to play Wcrafty 16.1 against Comet A.96. After 10 games >>>>Comet was leadind by 7.5 - 2.5. Something is wrong I thought! But >>>>after 34 games we see real situation. >>>> >>>>Comet 1 1 1 0.5 1 1 0 0 1 1 (7.5) >>>>Wcrafty 0 0 0 0.5 0 0 1 1 0 0 (2.5) >>>> total >>>>Comet 1 0 0 0 0.5 0 1 0 1 0.5 0 0 0 1 0 0 0 0 0 1 0 0 0.5 1 15 >>>>Wcrafty 0 1 1 1 0.5 1 0 1 0 0.5 1 1 1 0 1 1 1 1 1 0 1 1 0.5 0 19 >>>> >>>>So please no conclusions after 10 games - we need about 40. >>>> >>>>Jouni >>> >>> >>>You need more than 40, and that is quite a big swing that crafty made to >>>eventually win over comet. I do not know what to conclude after those sets of >>>games. I would like to see some of the games thouugh too see why there was such >>>a big swing. >>> >>>Not that I do not believe you >> >> >>It is not a big swing. Run computer matches everyday and you will notice this >>all the time. >> >>If you want to check this, flip a coin 30 times, and compute the score of head >>versus tail after every flip. Notice the swings in the 20 first results. >> >>I generally use 60 games matches and consider them to be +/- 2.5% accurate. >> >>That is, even if prog A scores 52.5% against prog B on 60 games, I consider it >>is impossible to say which is the best. I say A is better if it scores above >>52.5%. >> >>For 30 games matches I would take a +/- 5% margin of error. >> >>In the case of the Crafty/Comet match above, the result is 55.9% in favor of >>Crafty on 34 games, so I would conclude that Crafty is better. But you have to >>realize that the confidence on this statement is not high, so if I had to bet I >>would not bet too much. >> >> >> Christophe > > >Christophe, can I borrow your statistics book ? My book is much more >pessimistic. It tells me that for 60 games, all results narrower than 38-22 are >not 95%-significant (i.e. have a bigger than 5% probability of occurring for >equal strength programs). The assumption that chess is similiar to flipping a coin is not right you assume no draws for saying this. I think that 14 wins and 46 draws and no losses is a significant result when 37 wins and 23 losses is not a significant result if the colour is not important. It is more complicated because we must take the colour into the consideration For example 30 wins with white and 7 wins with black and 23 losses with black seems to be a significant result. Uri > >It also tells me that the margin of error does not fall linearly with the number >of games, but quadratically. That is to say, you have to play 4 times as many >games to cut the margin of error in half. > >Amir
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.