Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Lies.. Damn Lies & Statistics! ELO Ratings

Author: chandler yergin

Date: 17:44:10 01/12/05

Go up one level in this thread


http://en.wikipedia.org/wiki/Elo_rating_system


Partial Quoting:

Mathematical details
Performance can't be measured absolutely, it can only be inferred from wins and
losses. Ratings therefore have meaning only relative to other ratings.
Therefore, both the average and the spread of ratings can be arbitrarily chosen.
Élõ suggested scaling ratings so that a difference of 200 rating points in chess
would mean that the stronger player has an expected score of approximately 0.75,
and the USCF initially aimed for an average club player to have a rating of
1500.

A player's expected score is his probability of winning plus half his
probability of drawing. Thus an expected score of 0.75 could represent a 75%
chance of winnning, 25% chance of losing, and 0% chance of drawing. On the other
extreme it could represent a 50% chance of winning, 0% chance of losing, and 50%
chance of drawing. The probability of drawing, as opposed to having a decisive
result, is not specified in the ELO system. Instead a draw is considered half a
win and half a loss.

Above is an explanation for ELO in games where draws can occur. ELO ranking for
games without the possibility of draws (Go, Backgammon) is discussed in Go
rating with ELO. It explains also the non-cumulativeness of winning chances for
big ELO differences in those zero-sum, full-information games, where the result
can have also a quantity (small/big margin) in addition to the quality
(win/loss) (Go).

If Player A has true strength RA and Player B has true strength RB, the exact
formula (using the logistic curve) for the expected score of Player A is

.
Similarly the expected score for Player B is

.
Note that EA + EB = 1. In practice, since the true strength of each player is
unknown, the expected scores are calculated using the player's current ratings.

When a player's actual tournament scores exceed his expected scores, the ELO
system takes this as evidence that that player's rating is too low, and needs to
be adjusted upward. Similarly when a player's actual tournament scores fall
short of his expected scores, that player's rating is adjusted downward. Élõ's
original suggestion, which is still widely used, was a simple linear adjustment
proportional to the amount by which a player overperformed or underperformed his
expected score. The maximum possible adjustment per game (sometimes called the
K-value) was set at K=16 for masters and K=32 for weaker players.

Supposing Player A was expected to score EA points but actually scored SA
points. The formula for updating his rating is


This update can be performed after each game or each tournament, or after any
suitable rating period. An example may help clarify. Suppose Player A has a
rating of 1613, and plays in a five-round tournament. He loses to a player rated
1609, draws with a player rated 1477, defeats a player rated 1388, defeats a
player rated 1586, and loses to a player rated 1720. His actual score is (0 +
0.5 + 1 + 1 + 0) = 2.5. His expected score, calculated according the formula
above, was (0.506 + 0.686 + 0.785 + 0.539 + 0.351) = 2.867. Therefore his new
rating is (1613 + 32*(2.5 - 2.867)) = 1601.

Note that while two wins, two losses, and one draw may seem like a par score, it
is worse than expected for Player A because his opponents were lower rated on
average. Therefore he is slightly penalized. If he had scored two wins, one
loss, and two draws, for a total score of three points, that would have been
slightly better than expected, and his new rating would have been (1613 + 32*(3
- 2.867)) = 1617.

This updating procedure is at the core of the ratings used by FIDE, USCF, Yahoo!
Games, the ICC, and FICS. However, each organization has taken a different route
to deal with the uncertainty inherent in the ratings, particularly the ratings
of newcomers, and to deal with the problem of ratings inflation/deflation. New
players are assigned provisional ratings, which are adjusted more drastically
than established ratings, and various methods (none completely successful) have
been devised to inject points into the rating system so that ratings from
different eras are roughly comparable.

(The Diagrams are missing from the Text.. of course..)

HOW, can you Programmers assign an ELO Rating to a Computer... unless they have
played Humans?

IF, you read the article, you'll see that for the TOP Players.. whether by Win
or loss... they might only lose ONE POINT!
SOoooooooo this CRAP about ELO Ratings "jumping" in 10 years or 20 years. or
exponentionally is CRAP!
STOP the NONSENSE!



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.