Computer Chess Club Archives




Subject: Re: A question about statistics...

Author: Ricardo Gibert

Date: 10:46:48 01/04/04

Go up one level in this thread

On January 04, 2004 at 12:47:25, Peter Berger wrote:

>On January 04, 2004 at 12:40:00, Ricardo Gibert wrote:
>>On January 04, 2004 at 12:29:15, Mark Young wrote:
>>>On January 04, 2004 at 11:46:00, Roger Brown wrote:
>>>>Hello all,
>>>>I have read numerous posts about the validity - or lack thereof actually - of
>>>>short matches between and among chess engines.  The arguments of those who say
>>>>that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert
>>>>Hyatt et al)typically indicate that well over 200 games are requires to make any
>>>>sort of statisticdal statement that engine X is better than engine Y.
>>>>I concede this point.
>>>If you concede this point you don't understand. There is no magic number like
>>>200 or 2000. The score must be considered. Here is an example:
>>>A score of 17 - 3 in a 20 game match has a certainty of over 99% that the winner
>>>of the match is stronger then the loser.
>>>A 100 game match ending 55 - 45 only has a 81% chance that the winner of the
>>>match is the stronger program.
>>>A 200 game match ending 106 - 94 only has a 78 % chance that the winner is
>>>stronger then the loser.
>>Nothing you have said is really correct because you have ignored the significant
>>effect of draws in a match.
>The percentage of draws doesn't matter at all when it is about the conclusion
>which program is strongest based on the above match results.
>This has been shown by Remi Coloum and explained in multiple posts
>here(unfortunately the search engine hasn't found a new home yet).
>6-0 with 0 draws and 6-0 with 1000 draws has the exact same prediction value
>when it is about the question which engine is stronger based on a match result.

In this case, the number of decisive games (w+L=6) and margin of victory (w-L=6)
is the same in both cases so the conclusion they have equal value is correct.


In the examples given before, the number of decisive games depends on the number
of draws e.g. +17-3=0 and +14-0=6 are not of equal value since the number
decisive games are not equal.

Let's take a more obvious example. Let's say we play a 1000 game match and I win
by +20-0=980. I only score 51%, but if we then play a short match, your chances
of winning such a match is virtually zero, since the longer match has clearly
demonstrated you couldn't win a game if your life depended on it.

Now compare this with the alternative possibility. We play a 1000 game match and
I win +510-490=0. Again 51%. Now we play a short match afterward, the match
outcome will be very nearly a virtual coin flip.

The first match is very convincing in demonstrating superiority. It is just as
effective as +20-0=0 is as per Remi.

The second match is very unconvincing in demonstrating my superiority. It showed
a game between us is a virtual coin flip.

Draws matter a lot, but you need to understand just how. I'm very familiar with
what Remi has said on this and it was quite correct. The trouble is people
misunderstand what he has said.

If you have understood the above, you will then understand that my remark to
Mike Young was right on the money.


This page took 0.1 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.