# Computer Chess Club Archives

## Messages

### Subject: Re: Proving something is better

Author: Rémi Coulom

Date: 12:03:29 12/28/02

Go up one level in this thread

```On December 27, 2002 at 09:41:17, Peter Fendrich wrote:
>
>What to do
>----------
>I have a few suggestions that I would like to discuss:
>
>1) Better utilisation of computer time. If I have time for 20 games it's better
>to select 10 players and let A and B meat them respectively.
>The meaning of better will be better.

My personal use of the statistical test is to measure whether a change in my
chess program is an improvement or not, in order to decide whether to keep it or
not. Self-play is certainly not accurate in evaluating the difference in playing
strength between two close versions of the same program. In particular, it tends
to overamplify the effect of small differences. But that is its main interest:
it acts as a magnifying glass to observe the effect of a small change in the
program. I believe that, given a number of games to play, self-play is more
likely to give statistically significant results than playing against a pool of
opponents because of this amplification effect (this belief might be worth
testing, by the way). Of course, if you obtain statistically significant results
against 10 different players then it is certainly much more valuable.

Also, note that if you use 10 opponents, you will have 10 games by A and 10
games by B, whereas self-play would have produced 20 games for each player,
which, I suppose, would make it easier to reach a better statistical
significance.

>
>2) Use some degree of better, for instance 60% (instead of 50%) as the lower
>limit. "A beats B with at least 60%" with a probability of x%. It's hard to tell
>anything about probability against the rest of the population but maybe some a
>priori distribution can be used.
>
>In both cases draws has to be counted because they are part of the question.
>
>Peter

Yes, of course, that is a possibility. Unfortunately, the changes I usually make
to my chess program are so small that proving >50% probability of win is the
best I can hope, most of the time!

Rémi

```