Author: Ratko V Tomic
Date: 13:01:55 07/29/00
Go up one level in this thread
>> Or if you had to bet which one will come closer, would you >> pick the human judgment or the rating prediction based on >> a handful of games? > >Probably the human evaluation, but it's an artificial question. > It is not artificial at all. Before you reach 300 games in any evaluation, you will have reached 1 game, then 2 games,... i.e. the few game sample occurs by necessity whenever a large game sample occurs. What you call "artificial" occurs more often. What you're really trying to say by using dismissive term "artificial" is "unsuitable for the evaluation method I use." > You don't do > predictions on a handful of games using statistics for obvious reasons. That doesn't mean you can't or ought not to do predictions at all (in some other way than the simple statistical models) or that anyone who makes them is wrong or wasting time. You and everyone else make predictions based on small samples all the time. When you pick someone to vote for on election, you will do it based on your (implicit) prediction on how well various candidates may benefit you, your family or your country, even though they may have never held such office and you have no statistically significant basis for your choice. Similarly, you pick product brands in the supermarket without a statistically significant basis for your choice. And so on. Just because something has no statistical significance within a simple memoryless model (glorified coin flipping) it doesn't mean it is insignificant, "artificial" or that any other evaluation or modeling is "stupid." That was exactly the point on which this sub-thread arose -- some people jumped to ridicule a poster for presenting his judgment about the two programs based on "statistically insignificant" number of games. It doesn't mean if you can't compute usable ELO out of that number of games that nothing useful can be observed some other way (such as by a human analysis). I simply pointed out the narrow-mindedness and ignorance of such ridicule. >>They're much less random within the human model. > >No, again. It's like choosing between very, very random and very random. > The neural networks, such as brain, utilize (implicit) statistical modelling of vastly greater sophistication than the models which are easy to formalize mathematically (such as those taught to students in a course on statistics). While the models created by brain are certainly more sophisticated than memoryless random process (which underlies the ELO computation, as well as the coin flipping), being able to generalize and exctract patterns on a small and/or noisy samples far better than the simple memoryless process model, the brain has severe capacity limitations when trying to apply these powerful methods to the large samples. For example, a person cannot physically analyze hundreds of thousands of games and keep track of evaluations of tens of thousands of players. So by necessity for large samples we use simple statistical models. So each method has its domain of superiority. > No >matter the choice, the actual estimation is uncertain and useless. Uncertain, yes, but still substantially less so than the ELO computation (even on much larger sample). And the human evaluation based on a small sample is most certainly not "useless." A review of a program by a knowledgable and fairly strong player, based on an in depth analysis of, say 3 games he played against the program, is for me much more valuable (e.g. for a purchasing decision) than if someone told me that the program scored 157.5 - 142.5 against Fritz 6a at the tournament time controls. A human evaluator can have such review on the web the next day after the program came out, while it will take months (at best, if they pick it at all) for the SSDF to play hundreds of games to be able to say something useful about it. Having been buying chess machines and chess programs since 1981, I have found human reviews and strength evaluations of much greater value for my own purchasing decisions than the SSDF ratings (especially those of recent years when the evaluation of computer-human strength has been entirely abandoned). I suppose, this may in part due to my own 'peculiarity' to play almost exclusively against the programs myself, instead of playing one program against the other. For those interested chiefly in running comp-comp tournaments, the SSDF list would presumably be more useful, since it models that type of strength more closely than the human evaluator.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.