Author: Don Dailey
Date: 12:15:10 12/13/97
Go up one level in this thread
On December 13, 1997 at 13:50:54, Thorsten Czub wrote: >Ooops - now I am in big trouble because I have to disagree with Bruce >and Don together ! Hm... You are in no trouble! I will not get angry if someone disagree's with me. Maybe I will even change my mind if I like your arguments. >>How right you are! In testing I've done, I have seen scores more >>lopsided >>than this based on 100 games or so turn completely around after another >>200 games or so. It's to be expected. >May be. But scores is not important ! The games tell about the ELO, not >the score. The score various from each game, win - loss - loss - win - >win etc. >ELo depending on score goes up and down. But I don't need to build the >ELO from the SCORE ! >I can try to estimate the ELO by looking in the games. I still say you cannot do this with any kind of reasonable accuracy. >If you evaluate texts from pupils they have written about their last >holidays, you can give remarks about how many mistakes they have done >(typing and writing errors = quantifying method) or by evaluating their >style and expression and fantasy (qualitfying method). It depends on what you are trying to evaluate about the student. >>I remember a company once years ago producing a glossy document >>descibing >>and annotating a 10 game match between it's new chess computer and the >>previuos model. It was Morphy vs the spraklen 2.5 program if I remember >>correctly. The match result was 7 to 3 in favor of the new thing. But >>this kind of result is even less reliable than the actual score would >>suggest because you have to realize that they would not even be printing >>the results had it not gone their way! This advertising probably did >>appeal to the general public and it was fun reading (and maybe Morphy >>really was stronger) but I definitely took it with a grain of salt! > >Very OLD memories Don - :-) Woops, I'm showing my age! >Right. But don't you make a mistake by saying that STRENGTH is absolute Yes, this is a mistake in my opinion. Sometimes the "lower rated" player can win more than his fair share of games against a higher rated player because his style is "right" for that one opponent (or class of opponents.) But I do not believe this affect is very strong. I doubt it could ever stretch someones rating more than 50 points or so (WARNING: I'm taking a wild subjective guess here) against some opponent. I think a lot of psychology is happening here. You notice your losses to some opponent and based on ridiculously low sample sizes start believing that something is going on here. Then you start looking for it, expecting it and keep noticing it. Your thankful for each win but when you lose you say, "SEE there? It's happening again!" If it involves humans vs humans, you actually become psyched out and it becomes a self fulfilling prophecy. The whole ELO rating system is based on the idea that strength really is absolute. I've always thought it would be a really interesting experiment to take all the data from tournaments and try to determine if people fall into a few well know classes! It may turn out to be like rock, scissors and paper! If you are "rock" your rating is effectively 50 points higher if you are playing a "scissors" but 50 points lower against "paper"! All tournament players know there are just 2 or 3 basic classes of players to look out for. The Tactical player, not very strong strategically but see's tactics well and is tricky, or the careful player who defends really well and don't mind being a little cramped or the booked up fish, etc. >I remember that John Nunn once was impressed by a computer because it >played "very strong". In fact the computer played misable against my >friend Bernd Kohlweyer, not a strong IM (only 2420) but enough to smash >this machine into targets. WHY was John Nunn impressed ? >Because the machine played like himself, tactically. >We are always IMPRESSED when somebody holds a mirror of ourselves into >our direction. Try it with an ape, and he will be impressed too. >but does this say ANYTHING about strength ? >No. If you've played you machine against humans you will notice that humans seem to have come a long way against computers! Everyone has one now and people are no longer afraid of them and know how to play against them. BUT STILL, I'll bet this is not worth any more than about 50 points or so. What we may have noticed is that people used to play much worse than their true strength against computers (they had too much respect) and now they play better than their true strength in cases where they have lot's of experience playing them. So the apparent difference may be as much as 100 rating points. Again, I'm making this up as I go along! >And your score's ? If you know A wins against B in 75 : 33 games ! And >you know A's ELO, do you know B's ELO ? Really ? >I don't think so. >If you know a girl and she kisses nice, and you know another girl that >kisses only 2.3 times weaker, you don't really KNOW this girl ! >You don't even know her strength. All you know is how she kisses in >relation to girl A. I think the intransitivity is very strong here. Girl B's performance is probably strongly related to the person she is kissing! >And all you can say after the match A against B is: a plays (against B) >xyz ELO stronger. I don't think A will get the same amount of score he >got against B, against a bunch of ELO B players. Again, I agree but only very weakly. >When do we come to the point that we understand that NOTHING in the >world is deterministic ? >Of course we always try to make it deterministic. We like EASY >statements like: A is better B. Because a short way is faster to follow >than a long way. Also a short way gives earlier results. >But - and here I am sure, I can compete ANY competition with an >autoplayer. >I will always find out much earlier than any 40/120 autoplayer-system >HOW STRONG a chess program is. >I don't need 100 games. Nor 50. I think you do! >And if the scientifical approach NEEDS this much games to be precise, >than the method they use is senseless because it is beaten by another >method that is much faster and as precise (or even more precise). Human judgement is notoriously imprecise. I think you have very little chance of really saying with any degree of confidence that player A is better than B unless the difference is large. >Please, don't get me wrong: >ANY guy can measure this, if he can feel it. >it is not me. It is the long time I have done this. I am sure any >wine-expert is as good in testing wine and does not need machines and >experiments, I am also sure any mechanician is much better than any >machine in finding out which car is better.... > >So, please let me survive now... although I am against your opinion. >It is christmas time again. How dare you disagree with me! Seriously, I do believe there are some qualities in chess programs that are pretty difficult, perhaps impossible to measure scientifically and require human judgement to percieve. But I am just saying I don't think chess strength is one of them except in a very "grainy" low resolution sense. You can definitely say you like this programs style much better, or that it tends to be more sacraficial or more positional etc. But you have to decide in advance what you are trying to evaluate. -- Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.