Author: Stephen A. Boak
Date: 19:58:27 05/19/00
Go up one level in this thread
On May 19, 2000 at 22:29:34, Stephen A. Boak wrote: > :) > > There are many notions as to what should be rated or not rated. None are >right and none are wrong, a priori, i.e. in the abstract. Judgement of the >notions depends on what one is trying to achieve (measure, rate, etc). > > Humorous to me is something I was contemplating a bit earlier, when posting >elsewhere (above or below I don't remember at this instant) about some rating >calculations I do for myself (TPR, GPR, ATPR, AGPR, etc) and other ideas for >interesting statistics and measures (I like graphs, especially, of time series >trends): > > By necessity our ELO-based calculations are founded on past games. We are >obtaining a measure, largely, of how a program *has* performed (in the past). > > Statistics books will warn you, just like stock prospectuses and >advertisements for investment funds, 'Past performance is no guarantee of >similar future performance, which may not be as good.' or some such similar >phrase. > > > > It is only when we consider program performance trends (over time) that we >begin to think about how a program will perform in the future (next game, next >event, etc). > > > --Steve Whoops, I hit Submit Follow Up accidentally before finishing my lecture. :) The ELO-based systems rely on natural variation as a premise, as well as certain statistical ideas. A player's rating can fluctuate up or down, and may over time gradually rise or gradually fall (in general), although it will *always* fluctuate up and down, event by event, etc. The ELO-based systems rely (due to the nature of the mathematical calculations, carried out event by event, generally in time sequence) more heavily on recent games than ancient games. The fact, for example, that one game was rated incorrectly based on a loss (say, due to reporting inaccuracy), when in fact it was a win in reality, becomes irrelevant (lost in the noise level--probably even below the noise threshold) after perhaps 50 subsequent games or so have been played and rated. Statistics books indicate that measures of the past are often terrible indicators of the future--when used to forecast future results. Why? When there are detectable trends, or factors that indicate trends, a static measure of past performance will not rely on the trending that other methods of forecasting may utilize. The old, last rating of a program may not be the best predictor of the next, new rating of a program. Even though ELO-based systems use Winning Expectancy calculations and Actual Results and calculate rating changes based on deltas between expectancy and actual result, event by event normally, the ELO system doesn't try to predict trends, but merely the current (last) strength of the player, relying relatively more and more on the more and more recent games and results thereof. The ELO system assumes fluctuations up and down are normal, natural, and not biased toward either growth or decline. It doesn't really try to predict trend. The biggest humor in Dave's prior posting (an irony really) is that he seems to *jump* from current discussions regarding measuring how well a program has done (our perennial question--are top programs GMs *now* or not!) in the past, to thinking about predicting better (smaller SEE, Standard Error of Estimate or Standard Error of Forecast) how a program will do (in the future). This brings up my wry thought (perhaps personal humor to me only) that we haven't actually been trying to predict a program's future rating or future performance (as yet) based on any trends, and that the mathematics (statistics, etc) for forecasting will have to be evolved, based on the ELO system (likely) but with some trending aspects thrown in. To the non-statisticians or non-mathematicians out there--I am not talking about forecasting how strong a program will play in the distant future, on vastly superior hardware, after improvement to search and evaluation routines after many years of research. I am merely talking about the math tools that would predict as accurately as possible (smallest SEE) how a given comp would perform against a given opposition in the next event (or series of events) in the immediate future. Indeed the trend might be downward, for certain programs. Once released, the human players might learn their weaknesses and get better at playing against the comp styles and 'holes'. To me, it is clear that the ELO expectancy methodology is not the best for such predictions (since clearly some programs are being improved, and some human players are learning and getting better). If the current debates on how strong *is* a program are so contentious, how will the debates on those future calculations be? What trend aspects will be utilized, and to what degree--endless amusement for some, I guess (I sure like statistics and forecasting!). But what of the others, the consumers--hey I ordered a better appliance, not one that works *this* way, it doesn't do what I want (clean the house, put out the dog, cook the meals). Ho, ho! Off we go! --Steve
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.