Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: rebel 10~!! super strong on amd k62 500

Author: Ratko V Tomic

Date: 13:23:27 07/28/00

Go up one level in this thread


>> For a small number of games the judgment of a
>>knowlegable human player is clearly better predictor.
>
>Maybe, but it's far from certain.
>
It is quite certain. A good player playing a handful of games against a program
knows not only the result of the games but also how exactly such results came
about, all the ply by ply struggles and opportunities be it missed or noticed by
the program. Of course he can't tell you all the loopholes in the opening book
or endgame knowledge. But the rating calculation won't tell you any of that
either. In a small number of games the judgment of a good player is far superior
to the rating computed from the same set games. It is not even a close call.

>>The rating as a predictive model amounts to no more than
>>essentially saying -- the results so far were A:B, so I
>>predict that they will most likely remain A:B. That is really
>>the most simple minded kind of prediction one can make about
>>anything.
>
>That may be true. But making conclusions after observing a handful of games is
>just plain stupid.
>

Refusing to make conclusions (however provisional) on whatever information one
has at any given time is obviously more stupid than making conclusions
throughout and changing them as more information arrives. The strength of human
style real time, continuous modeling and model revisions, the human
intelligence, is precisely in making preliminary conclusions, the working
models, on a very scant amount of data. For some reason you absurdly claim that
improving ones predictive odds (by modelling the situation on whatever
information is available) is stupid.


>>Imagine such kind of predictor applied to 5 coin tosses, where
>>4 came out heads, 1 tail. A human would predict that on 1000
>>tosses the most likely otcome would be 500:500, while the rating
>>would predict 800:200. If I were to bet who will come closer
>>on 1000 tosses here, I would pick human every time. A human
>>observer uses additional information to make much better
>>prediction (such as observation and knowledge of the degree
>>of motoric control a person tossing the coin could have).
>
>That's nonsense. You don't extrapolate on such a small basis due to the
>uncertainty of the result.

You seem to miss the point. No one is saying that more games is worse than fewer
games. The question was whether something more useful or efficient than a flat
statistical model (rating models assuming a memoryless process, similar to coin
tossing) can be done when you have a small number of games. The answer is, yes,
much more can be extracted from a few games than a rating calculations can do. I
would bet my money on a predictions/judgment of strong player experienced with
playing against the programs after, say he played 5 games against the program,
over a computed rating based on these five games, or even the rating from 100
games. I would also trust more his prediction based on such small and
non-representative sample if one had to bet on how the program will do against a
third player.


>
> If you had any idea about chess games played by a single program,
> you would know that it's capable of covering the entire spectrum
> from brilliancies to blunders.
> A good player would be unable to fathom all aspects of most
> toplevel programs within a handful of games.


Well, of course, when evealuating programs, all else being equal, the more
person knows about the programs, in general and about the earlier versions of
the tested program, the more accurate his prediction will be.

As to your point of the human player/evaluator missing the whole spectrum...
etc, the rating formula knows even less about the "spectrum" and "brilliances"
than a human player. The rating calculation can see only loss, draw, win from
the entire game, i.e. 1.58 bits of information for the entire game. Even without
deploying any GM level chess knowledge, there is more (predictivly usable)
information about the program's strength per one move than what the rating
calculation takes out of the whole game.

> And what measure
> of comparison would he use. He can't use another chess program,
> beacuse that would increase the uncertainty as this particular
> opponent might amplify strength or weaknesses in the program,
> which capabilities you're estimating.

The measure of a predictive model is how well it predicts. If you wanted to
verify whether a human player will model the relative program strengths better
than the rating calculation on the same small set of games, one could, for
example, have the human evaluator observe and analyze the few games between the
programs and make his prediction for the next 100 games. At the same time SSDF
can compute the ratings from these same games and have this rating predict the
result of next 100 games. If you had to bet who will come closer to the actual
result in the next hundred games, the human predictor or the rating formula,
whose prediction would you pick?

Of course, this isn't meant to mean that the conventional rating is useless. My
original point is that the folks ridiculing a human player's judgment after a
few games as worthless are missaplying, out of ignorance ands/or malice,  the
uselessness of a simple-minded statistical modeling (such as the memoryless
steady process) to the strengths of human modelling of the situations with high
uncertainty (such as a very small sample of games).



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.