Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Russek -Rebel Match, Game 2

Author: Stephen A. Boak

Date: 15:58:56 01/01/00

Go up one level in this thread


>On January 01, 2000 at 10:20:48, James T. Walker wrote:

<snip>

>...I think if you take all aspects of the game into account you will
>find it's (60 Points) about right vs humans too.  This may diminish as ratings
>get higher because most ratings systems use different "K" factor to calculate
>the ratings.  The ICC ratings are inflated I think partly because they use K=32
>for all players while the USCF uses K=8 for 2400 and above.  The lower K factor
>causes more stability in the ratings and less inflation at the higher levels.
>This may be what is causing the 60 points to diverge at higher ratings and to be
>even more at lower ratings.  This is all my theory of course.
>Seasons Greetings,
>Jim Walker

Hi Jim,
  In an ELO-based system, rating lags performance--always.
  A newly calculated rating takes into account not only the player's recent
games, but also the player's performance over many prior games (as embodied in
the last calculated ELO rating of the player, i.e. Starting Rating).
  A difference in ELO ratings leads directly (due to the math behind the ELO
rating system) to a statistically valid expectation for performance of one
player versus another, or one player versus a particular field of opponents.
  The statistical basis for ELO-ratings takes account of (assumes) natural
variation--the fact that a player's results (versus expectation) will vary
randomly, event after event.  The ELO system assigns ratings that statistically
lead (through math calculations) to the average expected result versus any
particular opponent or field of opponents with known ratings.  Otherwise your
rating would simply reflect the results of your latest play.
  The ELO system also measures the growth or decline in playing strength of a
player over time.  Indeed one cannot know if a player's strength has grown or
declined except by measuring it *over time*.
  The K-factor has very little to do with inflation or deflation in general and
nothing to do with the relative comparison of human strength vs computer
strength.
  It is a factor that helps all players more quickly rise or fall in rating
(although there is always a lag, even among the lower rated players for which
the K-factor is higher in the USCF), according to his relatively current playing
form.  More properly stated, it is a mathematical factor that restricts or aids
the speed at which ratings change.
  In general, over time and many games of play, a human has achieved his proper
rating, relative to the other players in the overall rating pool.  This is an
underlying assumption of the ELO system.
  Some particular humans may be deemed overrated (rating is inflated), some may
be deemed underrated (rating is deflated), but by and large the overall group is
rated properly on the average.
  The fact that ratings at high levels in the USCF are governed by a smaller
K-factor and therefore do not change as rapidly as for a lower rated player does
not hinder a strong player's rating from rising or falling, over time, to its
proper level.  It simply governs how quickly (how many games it will take, i.e.
how long the lag will be) for a rating to reflect true strength, assuming the
player has risen or dropped in true strength (perhaps by studying/playing a lot
versus being inactive in both aspects or through aging--whatever the reason), or
is a new player for whom a measure of strength must be established--over time.
  Consider that J. Polgar, after losing to A. Shirov by 5.5 to 0.5 score might
dropped several hundred rating points 'instantly' if her new rating thereafter
was based soly on her particular (latest) match at that time; that is if her new
rating was not based on 1) her prior rating based on prior games, and 2) the
K-factor (whatever it might be in the FIDE ELO system), as well as 3) her recent
performance.
  Note a similar observation holds for Shirov, who might otherwise have vaulted
'instantly' to a rating much higher than Kasparov!
  Polgar simply had a natural variation in results (versus expectancy) and did
much worse than her 'average' expectation for scoring based on the starting FIDE
ratings for the players.  Her true strength is assumedly still in the 2600's,
despite a single bad match or tournament result.  By the same token, Shirov
simply had a natural variation in results (versus expectancy) and did much
better than his 'average' expectation for scoring based on the starting FIDE
ratings for the players.
  The K-factor used in ELO rating changes not only prevents relatively more
rapid gain in rating after a good performance, but it also protects against more
rapid loss in rating, after a poor performance (below ELO expectation).
  I don't call it a factor to regulate inflation/deflation, but merely a factor
to regulate the relative weighting of a players prior performance (embodied in
the starting rating for a rated period of reported games) versus their recent
performance in the new games submitted for rating in the rating period.
  Don't forget, the K-factor works both ways--to limit the speed of gain in
rating points, and to limit the speed of loss of rating points.  That is
evenhanded in general, neither favoring inflation or deflation of a pool of
players in the ELO based rating system.
  You might find the book written by Arpad Elo on his rating system to be very
informative and a big help to understand the mathematics and statistical
foundation intentionally devised by Elo as underpinnings to his rating system.
  I do not, nor did Elo claim that his rating system was 'perfect' for measuring
true playing strength of any particular player.  Quite the opposite, so to
speak.  He intentionally took into account the known statistical fact that
natural variation occurs in the all measured processes, including play of rated
human players.  He also intentionally took into account that player strength may
increase or decline over time.
  The K-factor is a necessary factor (no particular value for the K-factor is
necessary, however) to establish mathematically by how much performance will
lead rating, or rating will lag performance.  This balances a new rating
calculation somewhere between prior rating and current performance (example,
recent TPR).  A new rating never jumps to equal latest TPR nor to exceed latest
TPR--it only changes a portion of the distance to latest TPR and approaches it.
  Yes, it helps establish some kind of stability, more or less, for ratings in a
pool of rated players that experience both natural variation about a mean
expectation about last ELO rating, and growth or decline in strength over time.
It doesn't favor or disfavor ratings inflation in general, at any particular
level, even in USCF rating system that uses 3 different K-factors.  The ELO
system assumes that a player's playing strength may be measured in some manner,
over time, and thus assumes some inherent stability in true playing strength,
for most players in a rating pool--they are neither growing rapidly in strength
nor falling rapidly in strength.
  ELO carefully treats the situation of new players entering a rating pool, to
establish their starting ratings without undue inflation or deflation.
  Take care,
    --Steve Boak




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.