Author: Christophe Theron
Date: 23:12:01 11/23/98
Go up one level in this thread
On November 23, 1998 at 12:34:49, Amir Ban wrote:
>On November 23, 1998 at 11:51:21, Christophe Theron wrote:
>
(snip)
>>I generally use 60 games matches and consider them to be +/- 2.5% accurate.
>>
>>That is, even if prog A scores 52.5% against prog B on 60 games, I consider it
>>is impossible to say which is the best. I say A is better if it scores above
>>52.5%.
>>
>>For 30 games matches I would take a +/- 5% margin of error.
>>
>>In the case of the Crafty/Comet match above, the result is 55.9% in favor of
>>Crafty on 34 games, so I would conclude that Crafty is better. But you have to
>>realize that the confidence on this statement is not high, so if I had to bet I
>>would not bet too much.
>>
>>
>> Christophe
>
>
>Christophe, can I borrow your statistics book ? My book is much more
>pessimistic. It tells me that for 60 games, all results narrower than 38-22 are
>not 95%-significant (i.e. have a bigger than 5% probability of occurring for
>equal strength programs).
>
>It also tells me that the margin of error does not fall linearly with the number
>of games, but quadratically. That is to say, you have to play 4 times as many
>games to cut the margin of error in half.
You are right. My way of deciding which program is best is rather risky.
I would not recommend these numbers to everyone. They are just practical values
I use, because of practical constraints:
To get more confidence, I would have to play more games, which means:
1) play games with faster time controls
or
2) wait longer when I test a change in my code
or
3) use more computers at the same time to run my tests
It is hard to do #1, because the time controls I use are already much faster
than 40 moves in 2 hours.
I hate to do #2, because I have a lot of ideas to test, so I cannot wait several
days to get a result. One day I have evaluated the time needed to fully
explorate the ideas I had, and to find optimal values for several parameters. I
realized that even if I could test 24 hours a day it would take me several
months. At the same time, Mr Ban, Morsch, Lang(?), Donninger, De Koning,
Uniacke,... to name a few, are also working hard on their programs. It's a race
and I cannot spend my time waiting for results.
I could do #3 if I had more money. At this time I only have 2 PCs running my
tests sessions (one P100 and one P200). And I work on a third PC. My home is not
large, so it's not only a problem of financial resources. For example I have a
PC near my bed running test sessions all night long. Another funny problem here
in Guadeloupe, under tropical conditions, is that the computers heat up my house
a lot.
So all in all I think you can keep and trust your statistics book.
Mine is full of mistakes, but is still able to guide me (most of the time)
toward the right direction... :)
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.