Computer Chess Club Archives

Search

Terms
Messages

Subject: Continuation re: I am laughing, plus Rating Forecasting

Author: Stephen A. Boak
Date: 19:58:27 05/19/00
On May 19, 2000 at 22:29:34, Stephen A. Boak wrote:

>    :)
>
>    There are many notions as to what should be rated or not rated.  None are
>right and none are wrong, a priori, i.e. in the abstract.  Judgement of the
>notions depends on what one is trying to achieve (measure, rate, etc).
>
>    Humorous to me is something I was contemplating a bit earlier, when posting
>elsewhere (above or below I don't remember at this instant) about some rating
>calculations I do for myself (TPR, GPR, ATPR, AGPR, etc) and other ideas for
>interesting statistics and measures (I like graphs, especially, of time series
>trends):
>
>    By necessity our ELO-based calculations are founded on past games.  We are
>obtaining a measure, largely, of how a program *has* performed (in the past).
>
>    Statistics books will warn you, just like stock prospectuses and
>advertisements for investment funds, 'Past performance is no guarantee of
>similar future performance, which may not be as good.' or some such similar
>phrase.
>
>
>
>    It is only when we consider program performance trends (over time) that we
>begin to think about how a program will perform in the future (next game, next
>event, etc).
>
>
>    --Steve

Whoops, I hit Submit Follow Up accidentally before finishing my lecture.  :)

The ELO-based systems rely on natural variation as a premise, as well as certain
statistical ideas.  A player's rating can fluctuate up or down, and may over
time gradually rise or gradually fall (in general), although it will *always*
fluctuate up and down, event by event, etc.

The ELO-based systems rely (due to the nature of the mathematical calculations,
carried out event by event, generally in time sequence) more heavily on recent
games than ancient games.  The fact, for example, that one game was rated
incorrectly based on a loss (say, due to reporting inaccuracy), when in fact it
was a win in reality, becomes irrelevant (lost in the noise level--probably even
below the noise threshold) after perhaps 50 subsequent games or so have been
played and rated.

Statistics books indicate that measures of the past are often terrible
indicators of the future--when used to forecast future results.  Why?  When
there are detectable trends, or factors that indicate trends, a static measure
of past performance will not rely on the trending that other methods of
forecasting may utilize.

The old, last rating of a program may not be the best predictor of the next, new
rating of a program.  Even though ELO-based systems use Winning Expectancy
calculations and Actual Results and calculate rating changes based on deltas
between expectancy and actual result, event by event normally, the ELO system
doesn't try to predict trends, but merely the current (last) strength of the
player, relying relatively more and more on the more and more recent games and
results thereof.

The ELO system assumes fluctuations up and down are normal, natural, and not
biased toward either growth or decline.  It doesn't really try to predict trend.

The biggest humor in Dave's prior posting (an irony really) is that he seems to
*jump* from current discussions regarding measuring how well a program has done
(our perennial question--are top programs GMs *now* or not!) in the past, to
thinking about predicting better (smaller SEE, Standard Error of Estimate or
Standard Error of Forecast) how a program will do (in the future).

This brings up my wry thought (perhaps personal humor to me only) that we
haven't actually been trying to predict a program's future rating or future
performance (as yet) based on any trends, and that the mathematics (statistics,
etc) for forecasting will have to be evolved, based on the ELO system (likely)
but with some trending aspects thrown in.

To the non-statisticians or non-mathematicians out there--I am not talking about
forecasting how strong a program will play in the distant future, on vastly
superior hardware, after improvement to search and evaluation routines after
many years of research.  I am merely talking about the math tools that would
predict as accurately as possible (smallest SEE) how a given comp would perform
against a given opposition in the next event (or series of events) in the
immediate future.

Indeed the trend might be downward, for certain programs.  Once released, the
human players might learn their weaknesses and get better at playing against the
comp styles and 'holes'.

To me, it is clear that the ELO expectancy methodology is not the best for such
predictions (since clearly some programs are being improved, and some human
players are learning and getting better).

If the current debates on how strong *is* a program are so contentious, how will
the debates on those future calculations be?  What trend aspects will be
utilized, and to what degree--endless amusement for some, I guess (I sure like
statistics and forecasting!).  But what of the others, the consumers--hey I
ordered a better appliance, not one that works *this* way, it doesn't do what I
want (clean the house, put out the dog, cook the meals).

Ho, ho!  Off we go!

--Steve
Re: Continuation re: I am laughing, plus Rating Forecasting Dave Gomboc 00:21:45 05/20/00
- Re: Continuation re: I am laughing, plus Rating Forecasting Stephen A. Boak 04:30:53 05/20/00
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.