Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Junior - Crafty NPS Challenge - a user experiment

Author: Robert Hyatt

Date: 11:07:27 11/24/03

Go up one level in this thread


On November 24, 2003 at 12:59:39, Rémi Coulom wrote:

>On November 23, 2003 at 19:26:35, Sune Fischer wrote:
>
>>On November 23, 2003 at 14:31:43, Dieter Buerssner wrote:
>>
>>>On November 22, 2003 at 20:01:27, Robert Hyatt wrote:
>>>
>>>>I disagree.
>>>
>>>Ditto
>>>
>>>>6-0-0 vs 6-0-1000 are way different results.  And the
>>>>rating and rating error bar would be far different.
>>>
>>>I tried to explain, that Elo rating is not an objective measure for the
>>>likelyhood, that one is better.
>>
>>Elo doesn't try to measure the likelyhood at all, that is the problem.
>>
>>Anyone who understands real world numbers knows that they don't make a lot of
>>sense without some knowledge of their tolerance.
>>
>>>>With a 6 0 result
>>>>I would conclude the 6 side is significantly better.  with 6 wins and
>>>>1000 draws I would not conclude _either_ was better with any confidence.
>>>
>>>Both resutlts are identical for the question for the likelyhood, who is better.
>>
>>Without having read his paper I'd say that a 6-0 score indicates the winner is
>>far better than the loser, but the confidence is very low.
>>
>>Where as a 10006-10000 result indicates the players are almost equal with a very
>>high confidence.
>>
>>I don't think that doesn't necessarily contradicts what you say though.
>
>No, it does not.
>
>>
>>>If I cannot convince you, perhaps have a look at Rémi Coulom's paper, available
>>>from http://remi.coulom.free.fr/ (inside
>>>http://remi.coulom.free.fr/WhoIsBest.zip). One cite from that paper:
>>>
>>>"This proves that the likelihood that the first player is best does not depend
>>>on the number of draws."
>>
>>Something to read tonight perhaps :)
>
>The paper is a bit mathematical, but the fact that the likelihood does not
>depend on the number of draws can be explained intuitively rather easily:
>imagine a game called "chess+" where no draw is possible: each time a game is
>drawn, the two players start over from the initial position until one player
>wins. Draws are not counted. For the exact same sequence of games, depending on
>whether you consider they play chess or chess+, the score will be 1006-1000 or
>6-0. Obviously, the likelihood that one is better than the other is the same.

I _totally_ disagree with that.  Say we play tennis matches, with no tie-breaks.
We play 1000 sets and they all end 6-6.  Then I win the 1001th set.  You really
conclude that provides no more information about our skills than a single game
that ends 5-7?  The 1000 ties suggests a _lot_ about how close we are while
the 1 set says very little.

Draws count.  That's why the Elo formula specifically includes draws in the
calculation...

>
>Of course, this is true only if the hypotheses are true: games are independent
>random events, and the prior is uniform (which is reasonable in comp-comp
>matches without learning).


This seems to fail for sampling theory.  You have 1006 games to choose from.
Choose N random samples of 6 games.  You will be convinced that the two players
are very close, even though one won 6 more games than the other.  But most of
your random samples just get draws.  now take your 6 won games by themselves.
The only 6-game sample you can take is 6-0 which suggests that the 6-side is
way better.

IE this could be tested with a MonteCarlo approach pretty easily...

Somehow the above seems to overlook the basic idea of Elo ratings.

with 1006 games, 1000 draws, 6 wins, the ratings will be almost identical,
and that predicts that the outcome of any match will be a draw.  Which
matches the 1006 game match pretty well.  Just taking the 6 wins produces
a rating difference that predicts that one player will wipe up the other,
which is wrong.  Omitting the drawn games omits 99.9% of the useful data.

If all you care about is "who is better" then omitting the draws makes
some kind of sense, but it doesn't give any idea _how_ much better one
is than the other.  500 rating points or .001 rating points.  I believe
that is important information.  Particularly since we are dealing with
humans and computers that can "get sick".  Suppose on a normal day we
can only draw, but I get sick and lose 6 in a row.  You conclude you
are better.  You are wrong.  The 1000 draws are much more representative
of how we compare than the 6 wins/losses, in this case.





>
>I hope this message will save some mathematical reading for some.
>
>Rémi
>
>>
>>-S.
>>
>>>Regards,
>>>Dieter



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.