Author: Ratko V Tomic
Date: 13:23:27 07/28/00
Go up one level in this thread
>> For a small number of games the judgment of a >>knowlegable human player is clearly better predictor. > >Maybe, but it's far from certain. > It is quite certain. A good player playing a handful of games against a program knows not only the result of the games but also how exactly such results came about, all the ply by ply struggles and opportunities be it missed or noticed by the program. Of course he can't tell you all the loopholes in the opening book or endgame knowledge. But the rating calculation won't tell you any of that either. In a small number of games the judgment of a good player is far superior to the rating computed from the same set games. It is not even a close call. >>The rating as a predictive model amounts to no more than >>essentially saying -- the results so far were A:B, so I >>predict that they will most likely remain A:B. That is really >>the most simple minded kind of prediction one can make about >>anything. > >That may be true. But making conclusions after observing a handful of games is >just plain stupid. > Refusing to make conclusions (however provisional) on whatever information one has at any given time is obviously more stupid than making conclusions throughout and changing them as more information arrives. The strength of human style real time, continuous modeling and model revisions, the human intelligence, is precisely in making preliminary conclusions, the working models, on a very scant amount of data. For some reason you absurdly claim that improving ones predictive odds (by modelling the situation on whatever information is available) is stupid. >>Imagine such kind of predictor applied to 5 coin tosses, where >>4 came out heads, 1 tail. A human would predict that on 1000 >>tosses the most likely otcome would be 500:500, while the rating >>would predict 800:200. If I were to bet who will come closer >>on 1000 tosses here, I would pick human every time. A human >>observer uses additional information to make much better >>prediction (such as observation and knowledge of the degree >>of motoric control a person tossing the coin could have). > >That's nonsense. You don't extrapolate on such a small basis due to the >uncertainty of the result. You seem to miss the point. No one is saying that more games is worse than fewer games. The question was whether something more useful or efficient than a flat statistical model (rating models assuming a memoryless process, similar to coin tossing) can be done when you have a small number of games. The answer is, yes, much more can be extracted from a few games than a rating calculations can do. I would bet my money on a predictions/judgment of strong player experienced with playing against the programs after, say he played 5 games against the program, over a computed rating based on these five games, or even the rating from 100 games. I would also trust more his prediction based on such small and non-representative sample if one had to bet on how the program will do against a third player. > > If you had any idea about chess games played by a single program, > you would know that it's capable of covering the entire spectrum > from brilliancies to blunders. > A good player would be unable to fathom all aspects of most > toplevel programs within a handful of games. Well, of course, when evealuating programs, all else being equal, the more person knows about the programs, in general and about the earlier versions of the tested program, the more accurate his prediction will be. As to your point of the human player/evaluator missing the whole spectrum... etc, the rating formula knows even less about the "spectrum" and "brilliances" than a human player. The rating calculation can see only loss, draw, win from the entire game, i.e. 1.58 bits of information for the entire game. Even without deploying any GM level chess knowledge, there is more (predictivly usable) information about the program's strength per one move than what the rating calculation takes out of the whole game. > And what measure > of comparison would he use. He can't use another chess program, > beacuse that would increase the uncertainty as this particular > opponent might amplify strength or weaknesses in the program, > which capabilities you're estimating. The measure of a predictive model is how well it predicts. If you wanted to verify whether a human player will model the relative program strengths better than the rating calculation on the same small set of games, one could, for example, have the human evaluator observe and analyze the few games between the programs and make his prediction for the next 100 games. At the same time SSDF can compute the ratings from these same games and have this rating predict the result of next 100 games. If you had to bet who will come closer to the actual result in the next hundred games, the human predictor or the rating formula, whose prediction would you pick? Of course, this isn't meant to mean that the conventional rating is useless. My original point is that the folks ridiculing a human player's judgment after a few games as worthless are missaplying, out of ignorance ands/or malice, the uselessness of a simple-minded statistical modeling (such as the memoryless steady process) to the strengths of human modelling of the situations with high uncertainty (such as a very small sample of games).
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.