Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Junior - Crafty NPS Challenge - a user experiment

Author: Sune Fischer

Date: 15:55:18 11/24/03

Go up one level in this thread


On November 24, 2003 at 14:07:27, Robert Hyatt wrote:

>>The paper is a bit mathematical, but the fact that the likelihood does not
>>depend on the number of draws can be explained intuitively rather easily:
>>imagine a game called "chess+" where no draw is possible: each time a game is
>>drawn, the two players start over from the initial position until one player
>>wins. Draws are not counted. For the exact same sequence of games, depending on
>>whether you consider they play chess or chess+, the score will be 1006-1000 or
>>6-0. Obviously, the likelihood that one is better than the other is the same.
>
>I _totally_ disagree with that.  Say we play tennis matches, with no tie-breaks.
>We play 1000 sets and they all end 6-6.  Then I win the 1001th set.  You really
>conclude that provides no more information about our skills than a single game
>that ends 5-7?  The 1000 ties suggests a _lot_ about how close we are while
>the 1 set says very little.
>
>Draws count.  That's why the Elo formula specifically includes draws in the
>calculation...

You are looking at it the wrong way.
The question we want to answer is "who is better", not "how much better" or any
other related question.

Given the answer we seek you must admit that the draws give us no information.
In fact, it doesn't matter how high the probability of a draw is, because we
care only about the probability of winning or losing.

Whether we get 2% draws or 98% draws says nothing about what happens in the
remaining 98% respectively 2% of the games, and that *only that* is what we are
interested in.

>>Of course, this is true only if the hypotheses are true: games are independent
>>random events, and the prior is uniform (which is reasonable in comp-comp
>>matches without learning).
>
>
>This seems to fail for sampling theory.  You have 1006 games to choose from.
>Choose N random samples of 6 games.  You will be convinced that the two players
>are very close, even though one won 6 more games than the other.  But most of
>your random samples just get draws.  now take your 6 won games by themselves.
>The only 6-game sample you can take is 6-0 which suggests that the 6-side is
>way better.

If you consider the Elo rating you must have knowledge of the entire
distribution which would include knowledge of draws, however that is not the
object.

>If all you care about is "who is better" then omitting the draws makes
>some kind of sense, but it doesn't give any idea _how_ much better one
>is than the other.  500 rating points or .001 rating points.  I believe
>that is important information.

Actually, this isn't that important for incremental improvements.
You make a new version of your engine, the primary question is "is it better or
worse?".
Secondary is "how much better is it?", but actually we can live without
answering that at all, your new version is better so scrap the old and continue
development on this one.

>  Particularly since we are dealing with
>humans and computers that can "get sick".  Suppose on a normal day we
>can only draw, but I get sick and lose 6 in a row.  You conclude you
>are better.  You are wrong.  The 1000 draws are much more representative
>of how we compare than the 6 wins/losses, in this case.

You are mixing up the two question because you feel that being 0.001 better is
being equal, and it isn't in a mathematical sense.

-S.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.