Author: Ratko V Tomic
Date: 15:25:45 07/28/00
Go up one level in this thread
> However, the statistical model doesn't give absolute values, only > estimate and uncertainty. Coincidentally, the same applies to > human evaluation. Well they're both scale models of a phenomenon, i.e. they're not identical with the phenomenon, so they won't reproduce/mimick the phenomenon exactly. The question is which model, human evaluation or the rating computation on a small sample of games has _greater_ predictive power regarding the future games. Or if you had to bet which one will come closer, would you pick the human judgment or the rating prediction based on a handful of games? > > The assumption that a huamn player, however capable, can estimate > within a broad margin of error the strength of a program with > very little data, is not correct. Still it would be much better than the rating based prediction (based on the same set of games). >To make stable evaluations, you need stable performance. Human players do that >most of the time, computer programs don't, mixing blunders and brilliancies >regardless of strength. So evaluation based on a single game is, as you say, >simple-minded. Yes, but there are patterns to the variability and human is surely better in picking out such patterns than a simple memoryless source statistical model (the base for coin flipping and rating computation). >The defence of human intrepretation isn't wrong. It's just not relevant to the >case at hand, since the precision of empirical evaluation is based on a large >sample of games. Small samples are not interesting for estimation. They're interesting when you don't have much data, such as with DB II or, say, if you're playing against a new program at a tournament for which you can obtain only the handful of games it played in earlier rounds. The strength of human judgment and pattern recognition is precisely in these types of poorly defined (from the viewpoint of clean idealized models) situations. Just because someones favorite model doesn't work well in some situations doesn't mean these are uninteresting situations. If I am buying a program, I would find it much more useful (and interesting) to read a review where a strong player describes his impressions and judgments while playing against the program a hadful of games than to read that the program scored 158.5:141.5 against Fritz. Of course, a human cannot keep track, much less analyze, hundreds of thousands of games among thousands of players to make a judgment on such large sample. So the statistics is the necessary evil, i.e. in the absence of anything better one does what one can do or has to do in such situation. Like with cooking, an idividually prepared dish by an expert chef always beats the mass produced supermarket dishes. The latter is a compromise, a tradeoff of quality for quantity. Same goes for ratings, they're the only practical and inexpensive way to keep track of the large number of players and game results, without a need to think and analyze or know anything about chess at all. > They're random from a statistical point of view and from a human > estimation point of view. > They're much less random within the human model.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.