Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: rebel 10~!! super strong on amd k62 500

Author: Ratko V Tomic

Date: 13:01:55 07/29/00

Go up one level in this thread


>> Or if you had to bet which one will come closer, would you
>> pick the human judgment or the rating prediction based on
>> a handful of games?
>
>Probably the human evaluation, but it's an artificial question.
>

It is not artificial at all. Before you reach 300 games in any evaluation, you
will have reached 1 game, then 2 games,... i.e. the few game sample occurs by
necessity whenever a large game sample occurs. What you call "artificial" occurs
more often. What you're really trying to say by using dismissive term
"artificial" is "unsuitable for the evaluation method I use."

> You don't do
> predictions on a handful of games using statistics for obvious reasons.

That doesn't mean you can't or ought not to do predictions at all (in some other
way than the simple statistical models) or that anyone who makes them is wrong
or wasting time. You and everyone else make predictions based on small samples
all the time. When you pick someone to vote for on election, you will do it
based on your (implicit) prediction on how well various candidates may benefit
you, your family or your country, even though they may have never held such
office and you have no statistically significant basis for your choice.
Similarly, you pick product brands in the supermarket without a statistically
significant basis for your choice. And so on.

Just because something has no statistical significance within a simple
memoryless model (glorified coin flipping) it doesn't mean it is
insignificant, "artificial" or that any other evaluation or modeling is
"stupid." That was exactly the point on which this sub-thread arose --
some people jumped to ridicule a poster for presenting his judgment about
the two programs based on "statistically insignificant" number of games.
It doesn't mean if you can't compute usable ELO out of that number of games
that nothing useful can be observed some other way (such as by a human
analysis). I simply pointed out the narrow-mindedness and ignorance of
such ridicule.


>>They're much less random within the human model.
>
>No, again. It's like choosing between very, very random and very random.
>

The neural networks, such as brain, utilize (implicit) statistical modelling of
vastly greater sophistication than the models which are easy to formalize
mathematically (such as those taught to students in a course on statistics).
While the models created by brain are certainly more sophisticated than
memoryless random process (which underlies the ELO computation, as well as the
coin flipping), being able to generalize and exctract patterns on a small and/or
noisy samples far better than the simple memoryless process model, the brain has
severe capacity limitations when trying to apply these powerful methods to the
large samples. For example, a person cannot physically analyze hundreds of
thousands of games and keep track of evaluations of tens of thousands of
players. So by necessity for large samples we use simple statistical models. So
each method has its domain of superiority.

> No
>matter the choice, the actual estimation is uncertain and useless.

Uncertain, yes, but still substantially less so than the ELO computation (even
on much larger sample).

And the human evaluation based on a small sample is most certainly not
"useless." A review of a program by a knowledgable and fairly strong player,
based on an in depth analysis of, say 3 games he played against the program, is
for me much more valuable (e.g. for a purchasing decision) than if someone told
me that the program scored 157.5 - 142.5 against Fritz 6a at the tournament time
controls. A human evaluator can have such review on the web the next day after
the program came out, while it will take months (at best, if they pick it at
all) for the SSDF to play hundreds of games to be able to say something useful
about it.

Having been buying chess machines and chess programs since 1981, I have found
human reviews and strength evaluations of much greater value for my own
purchasing decisions than the SSDF ratings (especially those of recent years
when the evaluation of computer-human strength has been entirely abandoned). I
suppose, this may in part due to my own 'peculiarity' to play almost exclusively
against the programs myself, instead of playing one program against the other.
For those interested chiefly in running comp-comp tournaments, the SSDF list
would presumably be more useful, since it models that type of strength more
closely than the human evaluator.






This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.