Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: rebel 10~!! super strong on amd k62 500

Author: Ratko V Tomic

Date: 15:25:45 07/28/00

Go up one level in this thread


> However, the statistical model doesn't give absolute values, only
> estimate and uncertainty. Coincidentally, the same applies to
> human evaluation.

Well they're both scale models of a phenomenon, i.e. they're not identical with
the phenomenon, so they won't reproduce/mimick the phenomenon exactly. The
question is which model, human evaluation or the rating computation on a small
sample of games has _greater_ predictive power regarding the future games. Or if
you had to bet which one will come closer, would you pick the human judgment or
the rating prediction based on a handful of games?

>
> The assumption that a huamn player, however capable, can estimate
> within a broad margin of error the strength of a program with
> very little data, is not correct.

Still it would be much better than the rating based prediction (based on the
same set of games).

>To make stable evaluations, you need stable performance. Human players do that
>most of the time, computer programs don't, mixing blunders and brilliancies
>regardless of strength. So evaluation based on a single game is, as you say,
>simple-minded.

Yes, but there are patterns to the variability and human is surely better in
picking out such patterns than a simple memoryless source statistical model (the
base for coin flipping and rating computation).


>The defence of human intrepretation isn't wrong. It's just not relevant to the
>case at hand, since the precision of empirical evaluation is based on a large
>sample of games. Small samples are not interesting for estimation.

They're interesting when you don't have much data, such as with DB II or, say,
if you're playing against a new program at a tournament for which you can obtain
only the handful of games it played in earlier rounds. The strength of human
judgment and pattern recognition is precisely in these types of poorly defined
(from the viewpoint of clean idealized models) situations.

Just because someones favorite model doesn't work well in some situations
doesn't mean these are uninteresting situations. If I am buying a program, I
would find it much more useful (and interesting) to read a review where a strong
player describes his impressions and judgments while playing against the program
a hadful of games than to read that the program scored 158.5:141.5 against
Fritz.

Of course, a human cannot keep track, much less analyze, hundreds of thousands
of games among thousands of players to make a judgment on such large sample. So
the statistics is the necessary evil, i.e. in the absence of anything better one
does what one can do or has to do in such situation. Like with cooking, an
idividually prepared dish by an expert chef always beats the mass produced
supermarket dishes. The latter is a compromise, a tradeoff of quality for
quantity. Same goes for ratings, they're the only practical and inexpensive way
to keep track of the large number of players and game results, without a need to
think and analyze or know anything about chess at all.


> They're random from a statistical point of view and from a human
> estimation point of view.
>

They're much less random within the human model.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.