Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: How many games are needed to find out which program is stronger?

Author: Heiko Mikala
Date: 03:34:19 09/03/99
On September 03, 1999 at 03:16:50, Bruce Moreland wrote:

>On September 02, 1999 at 20:02:55, Heiko Mikala wrote:
>
>>And I say what you're saying is clearly wrong. Believe me, I learned this the
>>hard way during the last ten years of work on my own chess program. I often had
>>the case that in a first test match of about 30-40 games my program convincingly
>>won a match, than let it play another, longer match overnight and during the
>>next day, which it than lost. You always need the same amount of games, no
>>matter how the score is after a first, short match. My experience after hundreds
>>of test matches shows, that you need at least 70-80 games to be able to come to
>>a conclusion. And you need some hundred games to be sure. Even if the first 15
>>games end in an 15-0 score. Because the next 15 games may end 0-15. This is a
>>frustrating fact, but it is *a fact*. It's frustrating, because for us as
>>programmers it means, that we have to do much more time consuming testing than
>>we would like to do.
>
>It shouldn't work like this.  You can't take a selection from somewhere in the
>middle of a long run of games, and use that to prove anything, but if you start
>out and play some games, and one program wins a several games in a row, you
>should be able to make a safe conclusion.
>
>I would really like to understand the 15-0 and 0-15 situation.  That should
>*not* happen.  That's not how math should work.  If you flip a coin 15 times and
>it comes up heads each time, the odds of this happening completely by chance are
>extremely small.  The odds that it would then come up tails 15 times in a row
>are also extremely small, and combined they should be vanishingly small.
>
>You can find a run where this happens, with two equal strength programs, but it
>should have to be an extremely large run.
>
>Maybe there is something going on that destroys the randomness of the whole
>thing -- for instance it could be a problem involving a narrow book.

Hi Bruce,

as Harald said, I exagerated this a bit. The original match result we were
talking about was one with 8 wins in a row, and I have seen this - 5-6 wins by
A, followed by some draws and wins by both programs, followed by 5-6 wins by B -
very often.

The reason for this can be books, as you said, but also book learning, position
based learning and some other things like for example bugs (which I'm absolutely
sure are in every program).

This is, why I am convinced, that playing matches between two chessprograms is
not only simple statistics. It's neither totally random, nor totally non-random.
There is the randomness of book-move choices, and there are the calculated
moves, which should be reproducable (but often enough are not). And there is the
influence of learning, which can, in extreme cases, turn around the result of a
match after a long series of games. Although I have seen this happen even
between programs which didn't learn at all.

You wrote one sentence, which says exactly what I wanted to say:

>
>You can find a run where this happens, with two equal strength programs, but it
>should have to be an extremely large run.
>

My point was, that after 17 games, with games 1 to 9 beeing 5-4 and games 10 to
17 beeing 8-0 for program A, you can't draw any conclusions. You will have to
play *many* more games, because this *can* happen even between two equally
strong program although it is rare.

Greetings,

Heiko.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.