Computer Chess Club Archives




Subject: Re: A question about statistics...

Author: Mike Byrne

Date: 11:57:59 01/04/04

Go up one level in this thread

On January 04, 2004 at 13:46:48, Ricardo Gibert wrote:

>On January 04, 2004 at 12:47:25, Peter Berger wrote:
>>On January 04, 2004 at 12:40:00, Ricardo Gibert wrote:
>>>On January 04, 2004 at 12:29:15, Mark Young wrote:
>>>>On January 04, 2004 at 11:46:00, Roger Brown wrote:
>>>>>Hello all,
>>>>>I have read numerous posts about the validity - or lack thereof actually - of
>>>>>short matches between and among chess engines.  The arguments of those who say
>>>>>that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert
>>>>>Hyatt et al)typically indicate that well over 200 games are requires to make any
>>>>>sort of statisticdal statement that engine X is better than engine Y.
>>>>>I concede this point.
>>>>If you concede this point you don't understand. There is no magic number like
>>>>200 or 2000. The score must be considered. Here is an example:
>>>>A score of 17 - 3 in a 20 game match has a certainty of over 99% that the winner
>>>>of the match is stronger then the loser.
>>>>A 100 game match ending 55 - 45 only has a 81% chance that the winner of the
>>>>match is the stronger program.
>>>>A 200 game match ending 106 - 94 only has a 78 % chance that the winner is
>>>>stronger then the loser.
>>>Nothing you have said is really correct because you have ignored the significant
>>>effect of draws in a match.
>>The percentage of draws doesn't matter at all when it is about the conclusion
>>which program is strongest based on the above match results.
>>This has been shown by Remi Coloum and explained in multiple posts
>>here(unfortunately the search engine hasn't found a new home yet).
>>6-0 with 0 draws and 6-0 with 1000 draws has the exact same prediction value
>>when it is about the question which engine is stronger based on a match result.
>In this case, the number of decisive games (w+L=6) and margin of victory (w-L=6)
>is the same in both cases so the conclusion they have equal value is correct.
>    -------------------------------
>In the examples given before, the number of decisive games depends on the number
>of draws e.g. +17-3=0 and +14-0=6 are not of equal value since the number
>decisive games are not equal.
>Let's take a more obvious example. Let's say we play a 1000 game match and I win
>by +20-0=980. I only score 51%, but if we then play a short match, your chances
>of winning such a match is virtually zero, since the longer match has clearly
>demonstrated you couldn't win a game if your life depended on it.

But if you team needed a half point for you  to win the Olympias, this is match
up you wanted - a half point is a "shoo in" and you are the champs.  Sometimes a
draw is more important than a win and (in the example I used) is just as good as
a win.

Let's call the losing program "drawmaster"

 98% of the games will end in draw - a coinflip that lands on the edge?

>Now compare this with the alternative possibility. We play a 1000 game match and
>I win +510-490=0. Again 51%. Now we play a short match afterward, the match
>outcome will be very nearly a virtual coin flip.

Let's call this losing program "win_or_die"

>The first match is very convincing in demonstrating superiority. It is just as
>effective as +20-0=0 is as per Remi.

You may think so, but at the the end of the day, Dr Elo will have program
"drawmaster" rated exactly the same as "win_or_die" --- and ratings are what we
were talking about here.  Which program you may want to use may be based on
whether you need the win or a draw, if you need the draw , go with drawmaster,
if you need the full point , your chances are better with "win_or_die" .

>The second match is very unconvincing in demonstrating my superiority. It showed
>a game between us is a virtual coin flip.
>Draws matter a lot, but you need to understand just how. I'm very familiar with
>what Remi has said on this and it was quite correct. The trouble is people
>misunderstand what he has said.
>If you have understood the above, you will then understand that my remark to
>Mike Young was right on the money.

I understand the above, but you are mixing apples and oranges and in the context
of the discussion taking place, your post was not on the money.   It's really a
different subject (imo) and you just added unneeded confusion to a discussion.


This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.