Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Proving something is better

Author: Peter Fendrich
Date: 07:27:37 01/02/03
On December 28, 2002 at 15:03:29, Rémi Coulom wrote:

>On December 27, 2002 at 09:41:17, Peter Fendrich wrote:
>>
>>What to do
>>----------
>>I have a few suggestions that I would like to discuss:
>>
>>1) Better utilisation of computer time. If I have time for 20 games it's better
>>to select 10 players and let A and B meat them respectively.
>>The meaning of better will be better.
>
>My personal use of the statistical test is to measure whether a change in my
>chess program is an improvement or not, in order to decide whether to keep it or
>not. Self-play is certainly not accurate in evaluating the difference in playing
>strength between two close versions of the same program. In particular, it tends
>to overamplify the effect of small differences. But that is its main interest:
>it acts as a magnifying glass to observe the effect of a small change in the
>program.

Yes, if you're using it between versions. I do the same but only to tell if it's
worthwile to go on testing against other opponents.
When testing against other opponents we have a new situation.
As you know, many posters claim all sort of things just based on a match between
two players...

>I believe that, given a number of games to play, self-play is more
>likely to give statistically significant results than playing against a pool of
>opponents because of this amplification effect (this belief might be worth
>testing, by the way). Of course, if you obtain statistically significant results
>against 10 different players then it is certainly much more valuable.

I have the same belief and did also some small tests to verify it a year ago.
I assume however, that it depends on the program and the type of change.

>Also, note that if you use 10 opponents, you will have 10 games by A and 10
>games by B, whereas self-play would have produced 20 games for each player,
>which, I suppose, would make it easier to reach a better statistical
>significance.

>>
>>2) Use some degree of better, for instance 60% (instead of 50%) as the lower
>>limit. "A beats B with at least 60%" with a probability of x%. It's hard to tell
>>anything about probability against the rest of the population but maybe some a
>>priori distribution can be used.
>>
>>In both cases draws has to be counted because they are part of the question.
>>
>>Peter
>
>Yes, of course, that is a possibility. Unfortunately, the changes I usually make
>to my chess program are so small that proving >50% probability of win is the
>best I can hope, most of the time!

Well, anything above 50% will do. I'm convinced that not using the draws is to
lose quality in the conclusions due to loss of information.
One possibility is to turn around the question:
  - What is the highest possible win-% by using a fixed probability (like 95%).
    If it's low we can't possibly know if it has any effect at all on the
    population.
I think it would be possible to even find out where the limits are in general or
for a specific program and to use it as a lower value.

Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.