Computer Chess Club Archives

Messages

Subject: Re: A question about statistics... To SN

Author: Rolf Tueschen

Date: 06:18:57 01/06/04

Go up one level in this thread

```On January 06, 2004 at 04:13:21, Sandro Necchi wrote:

>On January 04, 2004 at 19:22:43, Ricardo Gibert wrote:
>
>>On January 04, 2004 at 14:57:59, Mike Byrne wrote:
>>
>>>On January 04, 2004 at 13:46:48, Ricardo Gibert wrote:
>>>
>>>>On January 04, 2004 at 12:47:25, Peter Berger wrote:
>>>>
>>>>>On January 04, 2004 at 12:40:00, Ricardo Gibert wrote:
>>>>>
>>>>>>On January 04, 2004 at 12:29:15, Mark Young wrote:
>>>>>>
>>>>>>>On January 04, 2004 at 11:46:00, Roger Brown wrote:
>>>>>>>
>>>>>>>>Hello all,
>>>>>>>>
>>>>>>>>I have read numerous posts about the validity - or lack thereof actually - of
>>>>>>>>short matches between and among chess engines.  The arguments of those who say
>>>>>>>>that such matches are meaningless (Kurt Utzinger, Christopher Theron, Robert
>>>>>>>>Hyatt et al)typically indicate that well over 200 games are requires to make any
>>>>>>>>sort of statisticdal statement that engine X is better than engine Y.
>>>>>>>>
>>>>>>>>I concede this point.
>>>>>>>
>>>>>>>If you concede this point you don't understand. There is no magic number like
>>>>>>>200 or 2000. The score must be considered. Here is an example:
>>>>>>>
>>>>>>>A score of 17 - 3 in a 20 game match has a certainty of over 99% that the winner
>>>>>>>of the match is stronger then the loser.
>>>>>>>
>>>>>>>A 100 game match ending 55 - 45 only has a 81% chance that the winner of the
>>>>>>>match is the stronger program.
>>>>>>>
>>>>>>>A 200 game match ending 106 - 94 only has a 78 % chance that the winner is
>>>>>>>stronger then the loser.
>>>>>>
>>>>>>
>>>>>>Nothing you have said is really correct because you have ignored the significant
>>>>>>effect of draws in a match.
>>>>>
>>>>>The percentage of draws doesn't matter at all when it is about the conclusion
>>>>>which program is strongest based on the above match results.
>>>>>
>>>>>This has been shown by Remi Coloum and explained in multiple posts
>>>>>here(unfortunately the search engine hasn't found a new home yet).
>>>>>
>>>>>6-0 with 0 draws and 6-0 with 1000 draws has the exact same prediction value
>>>>>when it is about the question which engine is stronger based on a match result.
>>>>
>>>>In this case, the number of decisive games (w+L=6) and margin of victory (w-L=6)
>>>>is the same in both cases so the conclusion they have equal value is correct.
>>>>
>>>>    -------------------------------
>>>>
>>>>In the examples given before, the number of decisive games depends on the number
>>>>of draws e.g. +17-3=0 and +14-0=6 are not of equal value since the number
>>>>decisive games are not equal.
>>>>
>>>>Let's take a more obvious example. Let's say we play a 1000 game match and I win
>>>>by +20-0=980. I only score 51%, but if we then play a short match, your chances
>>>>of winning such a match is virtually zero, since the longer match has clearly
>>>>demonstrated you couldn't win a game if your life depended on it.
>>>
>>>But if you team needed a half point for you  to win the Olympias, this is match
>>>up you wanted - a half point is a "shoo in" and you are the champs.  Sometimes a
>>>draw is more important than a win and (in the example I used) is just as good as
>>>a win.
>>>
>>>Let's call the losing program "drawmaster"
>>>
>>>
>>> 98% of the games will end in draw - a coinflip that lands on the edge?
>>>
>>>
>>>
>>>
>>>>
>>>>Now compare this with the alternative possibility. We play a 1000 game match and
>>>>I win +510-490=0. Again 51%. Now we play a short match afterward, the match
>>>>outcome will be very nearly a virtual coin flip.
>>>
>>>Let's call this losing program "win_or_die"
>>>
>>>>
>>>>The first match is very convincing in demonstrating superiority. It is just as
>>>>effective as +20-0=0 is as per Remi.
>>>
>>>You may think so, but at the the end of the day, Dr Elo will have program
>>>"drawmaster" rated exactly the same as "win_or_die" --- and ratings are what we
>>>were talking about here.  Which program you may want to use may be based on
>>>whether you need the win or a draw, if you need the draw , go with drawmaster,
>>>if you need the full point , your chances are better with "win_or_die" .
>>
>>Ratings are not what I was responding to. Among the many erroneous things Mike
>>Young said, "A 100 game match ending 55 - 45 only has a 81% chance that the
>>winner of the match is the stronger program."  This is a very specific statement
>>dealing with whether a given player is better or not.
>
>Well, if referred to 2 chess programs playing each other, this figure may be
>optimistic/not true.

Call it what you want - 81% means _exactly_ that you can't be certain. In
statistics we expect 95% at least.

>A better figure based on not to many games (<300) would be better with at least
>6 different opponents.
>There are cases where a program performs quite well against another one, but
>does not so well against other programs. The result could change the final
>figure more than you think.

1. First of all the last reflection is correct, but it has nothing to do with
statistics.

2. If you think that could win certitude with more opponents and therefore
_less_ games between two opponents then you are also totally wrong! Because less
games reduces the percentage (see above the 81%) even _more_! BTW this is
exactly what SSDF is doing wrong. They play matches of 40 games maximum and then
argue that 'overall' they already played over 30 000 games and that would prove
and save the validity of the results - which is total nonsense.

3. Summary for your case: if you want to test with _more_ players you need
forcedly _more_ games than between only two opponents! With 6 you need some
thousands. To get the same percentage. There is statistically no magic in such

_Please_ refrain from calling other people names and that they wouldn't "like"
SHREDDER or were 'against' your whole team when they just criticised the
violation of the computerchess rules in Graz - also stopp sending such emails to
me.

You won the title _only_ because of these violations and _not_ because you
reached the tie out of own strength! FRITZ is the real winner.

Or are you happy with the point out of a thrown game by your opponent? - - When
it _was_ a clear 3-fold repetition? Please let's come back to a mutual
understanding... and show some respect for reality.

I don't care if you run around and claim that you are the champion. However what
makes me sad is this: I can't accept that people are happy with winning a title
on the base of thrown games and false TD decisions. Why should I buy a program
that won a title on such bases?

Why should I buy a program that has no new features that FRITZ doesn't already
have? If I want to fuss around with dozens of progs I have plenty of amateur
engines, so no need to waste more money. Of course I have a version of SHREDDER
but I got it for a dozen bucks in a large super-market. I'm a chessplayer, you
know, and I have not much time to waste and I rely on the machines that are
state of the art. Like FRITZ or CRAFTY.

And I must honestly admit that your monthlong campaign as SHREDDERs book-author
against those people who only stated the very obvious and calling them unable to
accept the facts and such nonsense, will always shadow SHREDDER, no matter how
many such titles you will win in future.

Fair play isn't just history for me. You know, I beg you to accept that my way
of thinking is NOT fed by the immortal wish to do you harm or such some, but I
personally am unable to even understand why you defend the undefendable. Stefan
in the first place shouldn't have accepted the point against Jonny. Period.

I wish you all the best personally,

Rolf

>
>Sandro
>
>>Nothing to do with ratings
>>in that statement. He _cannot_ provide a figure like "81%" without consdiering
>>the percentage of games ending in draw. That's the type of mistake I directed
>>myself towards.
>>
>>>
>>>
>>>>
>>>>The second match is very unconvincing in demonstrating my superiority. It showed
>>>>a game between us is a virtual coin flip.
>>>>
>>>>Draws matter a lot, but you need to understand just how. I'm very familiar with
>>>>what Remi has said on this and it was quite correct. The trouble is people
>>>>misunderstand what he has said.
>>>>
>>>>If you have understood the above, you will then understand that my remark to
>>>>Mike Young was right on the money.
>>>
>>>I understand the above, but you are mixing apples and oranges and in the context
>>>of the discussion taking place, your post was not on the money.   It's really a
>>>different subject (imo) and you just added unneeded confusion to a discussion.
>>>
>>
>>I'm baffled as to why you think I'm mixing apples and oranges. I think you need
>>to read through the thread again more carefully. If you do, you will find I
>>cleared away some misconceptions rather than "...added unneeded confusion..."

```