Author: Ratko V Tomic
Date: 00:55:55 07/30/00
Go up one level in this thread
Most of the objections you brought up arise from apparently different semantics of the terms "model" or "modeling" we assumed. Being by education a theoretical physicist, in my usage the model is a mathematical procedure (or a computational procedure) designed to mimick some phenomenon or aspects of a phenomenon. The fundamental natural laws used in physics represent the seed or a generator of the models "grown out" of these seeds to mimick a particular phenomenon of interest. The procedure of "growing out" a model (capable of predicting some measurement results) from the fundamental laws involves application of initial and boundary conditions to the laws (which are usually in the form of partial differential equations or integro-differential equations). Once the model is constructed, it is run forward (via some computation) to find out what will happen within the model. If the seed was right and was grown right, the prediction will match the measurement on the actual phenomenon. In practice this process is iterative, i.e. should measurement result differ significantly from the prediction one goes back assuming in the first phase that there was a mismatch between the initial/boundary conditions (which were applied to the fundamental laws/seed) and the actual initial & boundary conditions which existed in the tested physical system. The instruments and setup may be recalibrated here and tested against some auxilary model of this sub-experiment. If that fails one assumes that the desired initial/boundary conditions are not practically feasable within the available set-up technology and one can try working with some more feasable form of initial/boundary conditions. But once the match between the initial/boundary conditions is established (and verified) and any computational errors excluded, the further mismatches in the model prediction and experiment imply that either the fundamental laws are incorrect or more often incomplete (i.e. there are additional phenomena involved in the setup which the laws deployed to create the model didn't capture). That's the rough sketch illustrating my semanitcs of the terms "model" and "modeling." Or briefly, the model of some system is another scaled down or a toy system which mimicks to some degree or some aspects of the system being modelled. The motivation for the modeling is that it is much cheaper and safer to run forward (in time/space) the toy model and see what happens (at least partially) than to do it with the real system and deal with irreversible consequences. Now, going to the chess performance modeling -- the model here, in most general case, is any computational procedure designed to predict the outcome of one or more games between some set of opponents. In an extreme case of the "perfect predictor model" for some set of players, this procedure would predict exactly the outcome of each game. Since computer programs are deterministic finite state machines, where a fairly small input data (of order of few Mb) specifying the initial/boundary state (which includes all the program settings, learned data, books, clocks, seeds for any random number generators, etc) determines uniquely every move the program will make next. Of course, there is no practical reason for anyone to invest time & money in this type of a perfect modeling of chess programs, but at least in principle a perfect predictor could be made, i.e. for the comp-comp games the perfect predictor exists at least in principle. For human-human or human-comp sets of players, we don't know whether the perfect model exists even in principle. That depends first on whether the _ultimate_ fundamental laws of physics are intrinsically non-deterministic or deterministic. Although the present quantum theory is non-deterministic, there may exist an underlying deterministic theory (Bell's theorem notwithstanding) replicating the statistical predictions of quantum theory. But even if the ultimate laws are fundamentally non-deterministic, it still may be the case that the brain processes involved in chess playing (or general thinking) are sufficiently deterministic in the sense that some finite set of measurements will allow prediction of the brain's computation result for some limited time interval into the future. For example of this type of practically deterministic process on top of a non-deterministic more fundamental process one can take a regular digital computer, which although its underlying processes are quantum, thus non-deterministic within the present fundamental postulates, is designed in such a way to allow for a finite set of initial "measurements" (such as examining the memory & CPU content) to determine uniquely and with practical certainty (although not an absolute certainty) what similar "measurements" (examination of the memory & CPU content) will yield in any future instant. So far the conclusion is that the perfect performance predictor model exists, at least in principle, for any comp-comp play. It is an open question whether such perfect predictors exist, even in principle, for human play. Of course, in comp-comp case, this perfect model for performance prediction would predict much more than what ELO aims to predict -- it would predict not only some long term average performance with certain probability but it would predict the outcome of individual games and of each move in each of these games. In any case, this certainly closes the issue whether ELO is the best model possible for predicting outcome of comp-comp chess games. It is clearly not. One can say in general that any finite deterministic system (such as comp-comp chess playing) can be modelled perfectly on a computer, be it in the trivial way via the exact state machine replica or in a more sophisticated ways (taking advantage, for example, of the shortcuts bypassing any sub-optimality in the state transition computation of the original system) so the statistical models in such cases are merely the sub-optimal practical necessities. Much of your discussion was about the imperfect models, the models capable of predicting the performance only approximately or statistically. The question of interest is whether the ELO model is the best one "practically" possible with the present day technology. Although this is a bit fuzzy question (e.g. what is "practical" and "practical" for whom) I think from the Uri's and my earlier comments, it should be clear that the ELO predictor can be improved in many ways, by including additional information as the input to the predictor. The ELO model takes as its input very little information regarding the past games which are practically (and very easily) available. In particular the plies of the game are certainly available and reveal more about the player strengths (and weaknesses) than the mere result. In my initial post I offered rough comparison of how much more information is easily available this way: the ELO utilizes (at most) 1.58 bits of information from the entire game, while the plies alone of a typical game will contain couple hundreds times more information (about 5 bits per ply). At least part of this extra information will have predictive value. An example of a simple improvement would be not to attribute an autonomous ELO to each player alone, but to all pairs of players. For example if a pair of players A and B have played some number of games in the past, one could create the pair AB ELO difference, D(A,B). Before the A and B ever played each other, the D(A,B) could be simply Elo(A)-Elo(B). But once they have played each other, the results of these games could be weighed higher (how much higher one would have to empirically find out) than their results against others, resulting in pair of numbers Elo(A,+B) and Elo(B,+A) and the D(A,B) would be the difference of these numbers (specific to this pair of players). For N player pool the resulting predictor model would contain not N ratings, as ELO does, but N*(N-1)/2 pair ratings. If one wishes to order players into the one-dimensional ranking list this could be computed out of the pair ratings by computing first the predicted scores among all pairs and using them as if a full round-robin torunament was played and these predicted scores were obtained, then applying the usual torunament rules for tie-breakers. Further refinement of the pair rating method would be to use additional classifications of all players by playing styles and favorite openings (with some generic classification value for "unknown"). Then for a given player pair A and B, we would also have style attributes for each, as vectors S(A) and S(B), as well as all the past results of A or B against all other players Z with with styles S(Z). One could then use all the past results of A but weigh them (for the purpose of pair A-B specific pair rating) so that the result of A versus Z is counted using greater weight the "more similar" style of Z is to the style of B (and analogously for B). Of course, the operational meaning for "more similar" would have to be determined empirically as some form of distance function between the "feature vectors" S(Z) and S(B). The weight would be maximal when the distance is 0 (i.e. when S(Z)==S(B), the special case of which is when Z==B). Therefore, this kind of model would be a generalization of the previous one. Either model would be a better predictor of player's performance than the one-dimensional ELO since it is a common knowledge that a player's performance is sensitive to the type of the opponent. While in the pre-computer era this type of pair modeling was impractical, today it is perfectly feasable. The style vectors S() could be extracted by a program from the plies of the game via suitable evaluation functions, and the opening repertoire would be trivially available. As to why we still use ELO, when predictions could be made more accurate -- there is no incentive, financial or any other, for stand-alone performance predictor beyond ELO. Even ELO is an overkill since most chess games played in the world are "non-rated" or "unofficial." If there were substantial money to be made from predicting player performance better than ELO (say, some high stakes betting), there would be any number of sophisticated models, as for example there many for the stock market predictions. Chess is only a game with relatively small financial stakes involved for the players, and even smaller stakes in the performance predictions. Now, the software companies making chess programs do have some incentive to streamline their testing against the market competitors. The tuning against the chief competitors, be it via killer books or parameter adjustement utilize ad hoc models (human enhanced methods) which go beyond ELO in performance prediction quality. The companies may hire strong chess players as advisors, to examine closely the games against the chief opponents and suggest what stylistic changes (e.g. parameter adjustments) or opening variations or endgame techniques could improve the program's performance against a given opponent. Even the amateur programmers use whatever chess knowledge they may have to tune their programs after observing the games ply by ply. In either case, if all one were taking into account here were the mere final game results (the ELO-only predictor), the tuning process would be orders of magnitude slower, since many games would have to be played with all possible settings to find the optimal one. A human player examining the games may after a mere handful of games advise an adjustment to king safety or pawn structure or adding of some endgame technique... which will with high probability improve the play much quicker than the blind trial and error with ELO only used (I'll label this method "ELO-only blundering"). But this whole cycle of playing the program under human supervision and input, then tuning the program's parameters, does represent a more sophisticated (and more effective) model for predicting program's performance than plain ELO. The plain ELO model based cycle would operate as a blind parameter adjustement followed by a large number of games, followed by next adjustement, followed by large number of games,... blundering blindly through the vast many-dimensional space of program parameters (which also includes opening variation choices and endgame techniques, in addition to narrow evaluation function parameters). The human "tuner" will often cut shortcuts toward the better performance through this parameter space by using the ply to ply information and his chess knowledge, all of that input completely invisible to the ELO model. By suggesting improvements the "tuner" is actually making a prediction about the performance of the adjusted program, saying, for example, that the new program which consists of the current program plus these opening lines removed or added, or this endgame technique or table-base added will perform better than the previous one. And for many types of tuning suggestions (especially those dealing with obvious holes in the program's knowledge) this type of prediction will be correct, the new program will play better. The ELO model can either be used to give the "new" program the default "unknown" rating (if it is treated as "new") or it could treat it as the "old" and leave the rating unchanged. In any case, the plain ELO model will mostly perfrom worse as a predictor of the "tuned" program than the (computer assisted) human tuner will. Revisiting now the human-comp or human-human modelling, the role of performance "tuner" for a human player is a chess coach and the player himself. Again, the same reasoning as for the program tuning applies -- a coach (or the player using the books & programs) may suggest any number of changes to his play which would with good odds improve his play. If the "tuning" were left to plain ELO model, the player would have to blindly try any change then play many games, then check the ELO, then try other changes, then play many games,... etc. And all throughout this blundering one would keep changes which increased the ELO and back off from changes which decreased it. Nobody would ever move ahead this way. It is precisely the intelligent use of the great deal of additional information produced in the player's past games (beyond just the final game result) which makes possible improvements within human lifetime. And the "intelligent use" which streamlines greatly the ELO-only based blundering means being able to predict performance (in the form of the change of perfromance from the present one) better than plain ELO could. ELO model in fact doesn't predict a change in performance as correlating with anything in particular. Since most humans can be (self-) coached to improve, it means that the successful coaching method which performs better than the random ELO-only blundering, does necessarily include in its cycle a performance predictor which can correlate multitude of factors with performance (in a form of performance change correlating with some playing reasoning change), the factors which are completely invisible to the plain ELO model. In conclusion, this demonstrates that better than ELO models for predicting chess performance, be it for human or for computer players, do exist and are actually used all the time as the necessary (albeit not commonly emphasized under "predictor" label) cog of the tuning/training procedures. If there were some significant incentives in the pure prediction quality, decoupled from the playing improvement objectives, the same methods would have been reformulated or extracted into the stand alone performance predictor substantially outperforming ELO (as it does when used as a cog).
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.