Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Elo Rating System Funadamentally Flawed?

Author: Robert Hyatt

Date: 06:35:39 01/05/00

Go up one level in this thread


On January 05, 2000 at 05:13:17, Graham Laight wrote:

>On January 04, 2000 at 18:37:12, Albert Silver wrote:
>
>>On January 04, 2000 at 18:19:18, Graham Laight wrote:
>>
>>>On January 04, 2000 at 13:17:14, Albert Silver wrote:
>>>
>>>>Why would doubling the speed make much of a difference in 40/2? It isn't
>>>>bullheaded, it's logical. So far I believe it is capable of PERFORMING at 2500
>>>>against WEAKER opposition. I am not convinced it will perform the same against
>>>>Grandmasters. If it performs 2500 against 2300 players but 2300 against 2500
>>>>players, it isn't playing at grandmaster strength.
>>>>
>>
>>>
>>>If the last sentence is possible then there's something seriously wrong with the
>>>Elo rating system.
>>
>>I not only believe it is possible, I think it's common. Just as the opposite is
>>common as well, though not among computers.
>>
>>                                         Albert Silver
>
>If Albert is right, then there really is a serious problem with the Elo rating
>system.
>
>In a 10 game match between two 2500 players, the expected score would be 5-5.
>


Not exactly, any more than if you flip a coin 10 times, you expect to get 5-5.
You might get 5-5 every now and then...  but you will also get 6-4, and even
10-0 on rare occasions.  In fact, if you _don't_ get 10-0, the coin is not
perfectly random.

This is about statistics, which are not exact at all.

I do believe that the Elo formula is being used wrongly on the chess servers,
as there are two components:  the statistical analysis to predict outcomes,
and the "K" factor which controls how quickly your rating changes.  K=32 is
a good value for the typical tournament player that may play a max of 100 games
a year.  32 is way too big for the typical server player that might play 100
games in a day.  I'd like to see a reduced K of 8 or so.  Or else the Glicko
rating where K varies depending on how many games you play in a given period
of time...



>Between a 2300 player and a 2500 player, I think it should be about 2.5-7.5 (by
>all means correct me if I'm wrong).
>
>If, in a large statistical sample, this can be shown to not occur, then we must
>conclude that the Elo rating system does not work, and should be abandoned.


It definitely doesn't work as you want...  because it is based on statistics
and sampling theory.  You get a rating, _and_ a confidence interval.




>
>Given that so many organisations have put so much faith in the Elo system, I
>suspect that it does have validity, and that Albert is not entirely correct in
>his belief.
>
>Does anyone out there know how well Professor Elo did his studies, and whether
>any follow up studies were done to check whether his rating system is correct?
>
>-g


It was based on normal statistics, derived from the central limit theorem.  IE
the math is correct.  But just like the coin toss, nothing is exact about
_predicting_ an outcome...

until we have a time machine, that is. Then prediction won't be needed any
longer, and we will get exact results.  :)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.