# Computer Chess Club Archives

## Messages

### Subject: Re: A question about statistics...

Author: Sandro Necchi

Date: 23:07:08 01/06/04

Go up one level in this thread

```On January 06, 2004 at 04:13:21, Sandro Necchi wrote:

>On January 04, 2004 at 19:22:43, Ricardo Gibert wrote:
>
>>On January 04, 2004 at 14:57:59, Mike Byrne wrote:
>>
>>>On January 04, 2004 at 13:46:48, Ricardo Gibert wrote:
>>>
>>>>On January 04, 2004 at 12:47:25, Peter Berger wrote:
>>>>
>>>>>On January 04, 2004 at 12:40:00, Ricardo Gibert wrote:
>>>>>
>>>>>>On January 04, 2004 at 12:29:15, Mark Young wrote:
>>>>>>
>>>>>>>On January 04, 2004 at 11:46:00, Roger Brown wrote:
>>>>>>>
>>>>>>>>Hello all,
>>>>>>>>
>>>>>>>>I have read numerous posts about the validity - or lack thereof actually - of
>>>>>>>>short matches between and among chess engines.  The arguments of those who say
>>>>>>>>that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert
>>>>>>>>Hyatt et al)typically indicate that well over 200 games are requires to make any
>>>>>>>>sort of statisticdal statement that engine X is better than engine Y.
>>>>>>>>
>>>>>>>>I concede this point.
>>>>>>>
>>>>>>>If you concede this point you don't understand. There is no magic number like
>>>>>>>200 or 2000. The score must be considered. Here is an example:
>>>>>>>
>>>>>>>A score of 17 - 3 in a 20 game match has a certainty of over 99% that the winner
>>>>>>>of the match is stronger then the loser.
>>>>>>>
>>>>>>>A 100 game match ending 55 - 45 only has a 81% chance that the winner of the
>>>>>>>match is the stronger program.
>>>>>>>
>>>>>>>A 200 game match ending 106 - 94 only has a 78 % chance that the winner is
>>>>>>>stronger then the loser.
>>>>>>
>>>>>>
>>>>>>Nothing you have said is really correct because you have ignored the significant
>>>>>>effect of draws in a match.
>>>>>
>>>>>The percentage of draws doesn't matter at all when it is about the conclusion
>>>>>which program is strongest based on the above match results.
>>>>>
>>>>>This has been shown by Remi Coloum and explained in multiple posts
>>>>>here(unfortunately the search engine hasn't found a new home yet).
>>>>>
>>>>>6-0 with 0 draws and 6-0 with 1000 draws has the exact same prediction value
>>>>>when it is about the question which engine is stronger based on a match result.
>>>>
>>>>In this case, the number of decisive games (w+L=6) and margin of victory (w-L=6)
>>>>is the same in both cases so the conclusion they have equal value is correct.
>>>>
>>>>    -------------------------------
>>>>
>>>>In the examples given before, the number of decisive games depends on the number
>>>>of draws e.g. +17-3=0 and +14-0=6 are not of equal value since the number
>>>>decisive games are not equal.
>>>>
>>>>Let's take a more obvious example. Let's say we play a 1000 game match and I win
>>>>by +20-0=980. I only score 51%, but if we then play a short match, your chances
>>>>of winning such a match is virtually zero, since the longer match has clearly
>>>>demonstrated you couldn't win a game if your life depended on it.
>>>
>>>But if you team needed a half point for you  to win the Olympias, this is match
>>>up you wanted - a half point is a "shoo in" and you are the champs.  Sometimes a
>>>draw is more important than a win and (in the example I used) is just as good as
>>>a win.
>>>
>>>Let's call the losing program "drawmaster"
>>>
>>>
>>> 98% of the games will end in draw - a coinflip that lands on the edge?
>>>
>>>
>>>
>>>
>>>>
>>>>Now compare this with the alternative possibility. We play a 1000 game match and
>>>>I win +510-490=0. Again 51%. Now we play a short match afterward, the match
>>>>outcome will be very nearly a virtual coin flip.
>>>
>>>Let's call this losing program "win_or_die"
>>>
>>>>
>>>>The first match is very convincing in demonstrating superiority. It is just as
>>>>effective as +20-0=0 is as per Remi.
>>>
>>>You may think so, but at the the end of the day, Dr Elo will have program
>>>"drawmaster" rated exactly the same as "win_or_die" --- and ratings are what we
>>>were talking about here.  Which program you may want to use may be based on
>>>whether you need the win or a draw, if you need the draw , go with drawmaster,
>>>if you need the full point , your chances are better with "win_or_die" .
>>
>>Ratings are not what I was responding to. Among the many erroneous things Mike
>>Young said, "A 100 game match ending 55 - 45 only has a 81% chance that the
>>winner of the match is the stronger program."  This is a very specific statement
>>dealing with whether a given player is better or not.
>
>Well, if referred to 2 chess programs playing each other, this figure may be
>optimistic/not true.
>A better figure based on not to many games (<300) would be better with at least
>6 different opponents.
>There are cases where a program performs quite well against another one, but
>does not so well against other programs. The result could change the final
>figure more than you think.
>
>Sandro

Sorry, I was not precise enough, so to let everybody understand I will try to be
more clear:

the percentage of a program to be stronger than another in a single match score
55 to 45 is about 20% and not 81%
I do not care about statistic, but about real figure based on many tests and
experience. So if you are interested to know how correct the result is than you
get 20%.
The reason is that if you look at the games you quite probably will find
variation which scored quite well (or quite bad), thus putting a big weight on
the final score. This is why it is better to make the same test against other
chess programs; at least against other 5.

There are only 2 ways to know if a program is better than another one:

1. To make a huge amount of games against several opponents; at least 1000
games. This everybody can do.

2. To look at the games and analyze them. You need to be a strong player to do
this and/or to know chess programs a lot as well.

Sandro

>
>>Nothing to do with ratings
>>in that statement. He _cannot_ provide a figure like "81%" without consdiering
>>the percentage of games ending in draw. That's the type of mistake I directed
>>myself towards.
>>
>>>
>>>
>>>>
>>>>The second match is very unconvincing in demonstrating my superiority. It showed
>>>>a game between us is a virtual coin flip.
>>>>
>>>>Draws matter a lot, but you need to understand just how. I'm very familiar with
>>>>what Remi has said on this and it was quite correct. The trouble is people
>>>>misunderstand what he has said.
>>>>
>>>>If you have understood the above, you will then understand that my remark to
>>>>Mike Young was right on the money.
>>>
>>>I understand the above, but you are mixing apples and oranges and in the context
>>>of the discussion taking place, your post was not on the money.   It's really a
>>>different subject (imo) and you just added unneeded confusion to a discussion.
>>>
>>
>>I'm baffled as to why you think I'm mixing apples and oranges. I think you need
>>to read through the thread again more carefully. If you do, you will find I
>>cleared away some misconceptions rather than "...added unneeded confusion..."

```