Author: Robert Hyatt
Date: 11:07:27 11/24/03
Go up one level in this thread
On November 24, 2003 at 12:59:39, Rémi Coulom wrote: >On November 23, 2003 at 19:26:35, Sune Fischer wrote: > >>On November 23, 2003 at 14:31:43, Dieter Buerssner wrote: >> >>>On November 22, 2003 at 20:01:27, Robert Hyatt wrote: >>> >>>>I disagree. >>> >>>Ditto >>> >>>>6-0-0 vs 6-0-1000 are way different results. And the >>>>rating and rating error bar would be far different. >>> >>>I tried to explain, that Elo rating is not an objective measure for the >>>likelyhood, that one is better. >> >>Elo doesn't try to measure the likelyhood at all, that is the problem. >> >>Anyone who understands real world numbers knows that they don't make a lot of >>sense without some knowledge of their tolerance. >> >>>>With a 6 0 result >>>>I would conclude the 6 side is significantly better. with 6 wins and >>>>1000 draws I would not conclude _either_ was better with any confidence. >>> >>>Both resutlts are identical for the question for the likelyhood, who is better. >> >>Without having read his paper I'd say that a 6-0 score indicates the winner is >>far better than the loser, but the confidence is very low. >> >>Where as a 10006-10000 result indicates the players are almost equal with a very >>high confidence. >> >>I don't think that doesn't necessarily contradicts what you say though. > >No, it does not. > >> >>>If I cannot convince you, perhaps have a look at Rémi Coulom's paper, available >>>from http://remi.coulom.free.fr/ (inside >>>http://remi.coulom.free.fr/WhoIsBest.zip). One cite from that paper: >>> >>>"This proves that the likelihood that the first player is best does not depend >>>on the number of draws." >> >>Something to read tonight perhaps :) > >The paper is a bit mathematical, but the fact that the likelihood does not >depend on the number of draws can be explained intuitively rather easily: >imagine a game called "chess+" where no draw is possible: each time a game is >drawn, the two players start over from the initial position until one player >wins. Draws are not counted. For the exact same sequence of games, depending on >whether you consider they play chess or chess+, the score will be 1006-1000 or >6-0. Obviously, the likelihood that one is better than the other is the same. I _totally_ disagree with that. Say we play tennis matches, with no tie-breaks. We play 1000 sets and they all end 6-6. Then I win the 1001th set. You really conclude that provides no more information about our skills than a single game that ends 5-7? The 1000 ties suggests a _lot_ about how close we are while the 1 set says very little. Draws count. That's why the Elo formula specifically includes draws in the calculation... > >Of course, this is true only if the hypotheses are true: games are independent >random events, and the prior is uniform (which is reasonable in comp-comp >matches without learning). This seems to fail for sampling theory. You have 1006 games to choose from. Choose N random samples of 6 games. You will be convinced that the two players are very close, even though one won 6 more games than the other. But most of your random samples just get draws. now take your 6 won games by themselves. The only 6-game sample you can take is 6-0 which suggests that the 6-side is way better. IE this could be tested with a MonteCarlo approach pretty easily... Somehow the above seems to overlook the basic idea of Elo ratings. with 1006 games, 1000 draws, 6 wins, the ratings will be almost identical, and that predicts that the outcome of any match will be a draw. Which matches the 1006 game match pretty well. Just taking the 6 wins produces a rating difference that predicts that one player will wipe up the other, which is wrong. Omitting the drawn games omits 99.9% of the useful data. If all you care about is "who is better" then omitting the draws makes some kind of sense, but it doesn't give any idea _how_ much better one is than the other. 500 rating points or .001 rating points. I believe that is important information. Particularly since we are dealing with humans and computers that can "get sick". Suppose on a normal day we can only draw, but I get sick and lose 6 in a row. You conclude you are better. You are wrong. The 1000 draws are much more representative of how we compare than the 6 wins/losses, in this case. > >I hope this message will save some mathematical reading for some. > >Rémi > >> >>-S. >> >>>Regards, >>>Dieter
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.