Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How many games are needed to find out which program is stronger?

Author: Dave Gomboc

Date: 23:59:26 09/02/99

Go up one level in this thread


On September 02, 1999 at 20:02:55, Heiko Mikala wrote:

>On September 02, 1999 at 18:58:24, Dave Gomboc wrote:
>
>>On September 02, 1999 at 18:20:22, Heiko Mikala wrote:
>>
>>>On September 02, 1999 at 15:52:17, Dave Gomboc wrote:
>>>
>>>
>>>>I don't think that you addressed my point, namely:
>>>>
>>>>  Less games are required to conclude with a certain confidence
>>>>  that one program is better than another when the results are
>>>>  lopsided than when they are not.
>>>
>>>Oh yes, I think I did exactly address your point.
>>>
>>>Look at these two match-fragments:
>>>
>>>    1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
>>>A   ½ 0 ½ 1 ½ 1 0 1 ½ 1 1 1 1 1 1   11.0/15
>>>B   ½ 1 ½ 0 ½ 0 1 0 ½ 0 0 0 0 0 0    4.0/15
>>>
>>>
>>>    1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
>>>A   1 1 ½ 1 0 0 1 0 0 1 1 1 1 1 1   10.5/15
>>>B   0 0 ½ 0 1 1 0 1 1 0 0 0 0 0 0    4.5/15
>>>
>>>They both look very similar, don't they? So, if I understand you correctly, you
>>>would conclude from both matches, that engine A will most definitely be stronger
>>>than engine B, because the results are *very* "lopsided".
>>
>>You don't understand me correctly.
>>
>>I am saying that a match where program A scores 60% versus program B must be
>>much longer than a match where program A scores 80% versus program B before one
>>can conclude with the same confidence level (e.g. 0.95) that A is a stronger
>>player than B.
>>
>>I hope that was precise enough.
>
>And I say what you're saying is clearly wrong. Believe me, I learned this the
>hard way during the last ten years of work on my own chess program. I often had
>the case that in a first test match of about 30-40 games my program convincingly
>won a match, than let it play another, longer match overnight and during the
>next day, which it than lost. You always need the same amount of games, no
>matter how the score is after a first, short match. My experience after hundreds
>of test matches shows, that you need at least 70-80 games to be able to come to
>a conclusion. And you need some hundred games to be sure. Even if the first 15
>games end in an 15-0 score. Because the next 15 games may end 0-15. This is a
>frustrating fact, but it is *a fact*. It's frustrating, because for us as
>programmers it means, that we have to do much more time consuming testing than
>we would like to do.
>
>Ask other programmers, how long their test matches are. You won't find many, who
>will tell you that their matches are only 20 or 30 games. Guess why.
>
>To say it in your words (to be precise ;-):
>
>
>  I am saying that a match where program A scores 60% versus program B must be
>  *as long as* a match where program A scores 80% versus program B before one
>  can conclude with the same confidence level (e.g. 0.95) that A is a stronger
>  player than B.
>
>
>Try it yourself. Play some long matches between programs. Don't stop after 20 or
>40 games but play on.
>
>Or, if you don't believe me, look at the published match results from other
>people here in this forum for example. You will easily find examples, where one
>tester has an 80% score for program A, while another tester has an 80% score for
>program B. That's one of the reasons for many fights that have taken place here
>at the CCC! And because many people still don't believe, that they need hundreds
>of games to be absolutely sure, they accuse each other of cheating.
>
>Coming back to the examples of my last post. In both matches program A reached a
>75% score (very close to your 80%). So would you conclude, that you could end
>these matches earlier than if program A only had scored 60%? I wouldn't. And
>even after 73 games in the second match, I still wasn't convinced, that program
>A was really stronger than B.
>
>
>Heiko.

Crack open an intro to statistics book.

Dave



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.