Computer Chess Club Archives




Subject: Re: Proving something is better

Author: Peter Fendrich

Date: 07:27:37 01/02/03

Go up one level in this thread

On December 28, 2002 at 15:03:29, Rémi Coulom wrote:

>On December 27, 2002 at 09:41:17, Peter Fendrich wrote:
>>What to do
>>I have a few suggestions that I would like to discuss:
>>1) Better utilisation of computer time. If I have time for 20 games it's better
>>to select 10 players and let A and B meat them respectively.
>>The meaning of better will be better.
>My personal use of the statistical test is to measure whether a change in my
>chess program is an improvement or not, in order to decide whether to keep it or
>not. Self-play is certainly not accurate in evaluating the difference in playing
>strength between two close versions of the same program. In particular, it tends
>to overamplify the effect of small differences. But that is its main interest:
>it acts as a magnifying glass to observe the effect of a small change in the

Yes, if you're using it between versions. I do the same but only to tell if it's
worthwile to go on testing against other opponents.
When testing against other opponents we have a new situation.
As you know, many posters claim all sort of things just based on a match between
two players...

>I believe that, given a number of games to play, self-play is more
>likely to give statistically significant results than playing against a pool of
>opponents because of this amplification effect (this belief might be worth
>testing, by the way). Of course, if you obtain statistically significant results
>against 10 different players then it is certainly much more valuable.

I have the same belief and did also some small tests to verify it a year ago.
I assume however, that it depends on the program and the type of change.

>Also, note that if you use 10 opponents, you will have 10 games by A and 10
>games by B, whereas self-play would have produced 20 games for each player,
>which, I suppose, would make it easier to reach a better statistical

>>2) Use some degree of better, for instance 60% (instead of 50%) as the lower
>>limit. "A beats B with at least 60%" with a probability of x%. It's hard to tell
>>anything about probability against the rest of the population but maybe some a
>>priori distribution can be used.
>>In both cases draws has to be counted because they are part of the question.
>Yes, of course, that is a possibility. Unfortunately, the changes I usually make
>to my chess program are so small that proving >50% probability of win is the
>best I can hope, most of the time!

Well, anything above 50% will do. I'm convinced that not using the draws is to
lose quality in the conclusions due to loss of information.
One possibility is to turn around the question:
  - What is the highest possible win-% by using a fixed probability (like 95%).
    If it's low we can't possibly know if it has any effect at all on the
I think it would be possible to even find out where the limits are in general or
for a specific program and to use it as a lower value.


This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.