Author: Albert Silver
Date: 04:22:28 12/14/99
Go up one level in this thread
On December 13, 1999 at 07:53:26, Bertil Eklund wrote: >On December 13, 1999 at 06:29:21, Albert Silver wrote: > >>On December 12, 1999 at 17:18:46, Roger wrote: >> >>>Yep, the SSDF is a pool unto itself, and as such, its ratings can't be compared >>>to those of humans. >>> >>>The problem in saying that the ratings are 100% ACCURATE is: Accurate compared >>>to what? If the ratings are what they are, that is, if the pool is 100% >>>isolated, then the statement is tautological: The SSDF ratings are accurate >>>compared to themselves. Not very exciting. >> >>Not sure what you mean. >> >>> >>>So...my opinion is that statements about the accuracy of the ratings must refer >>>to some external source of validation, in other words, some reference point >>>outside the pool itself. >> >>Why? >> >>> >>>And that, of course, would be human ratings. >> >>I don't understand how adding games against humans will make a rating system >>that calculates how computers do against other computers more precise. If >>anything it will make it valueless. >> >>> >>>So, as more IM and GM versus computer games emerge, the SSDF ratings can >>>eventually be recalibrated >> >>Recalibrated? You are assuming they are supposed to be connected. SSDF ratings >>calculate computer versus computer ratings. If you change the pool, you change >>what they are calculating, not making it more precise. Where is it imprecise? > >Hi! > >Of course they are (was) connected, once upon a time (1993) the list was >connected with about 300 games against humans. The level of the list is still >based on those games. I know. Back when I was just learning, my interest in computers playing the game had already been piqued, and I enjoyed watching the strong players wrestle against the wooden wonder, the Mephisto Roma, etc... And Pierre Nolot, a close friend of the shop management, used to bring in the latest news including the SSDF ratings. It was remarked on that the ratings of the Par Excellence and the new phenomenon, the Mach III, were remarkably close to the SSDF ratings. Both French ratings had been obtained by playing against some 40 opponents at the rhythm of 40/2h, and the Par Excellence was situated at 1850 while the Mach III was considered the first to break the then mythical barrier of 2000 Elo with a rating of 2036. Both ratings were very close to the SSDF ratings which had them listed at 1835 and 1993 respectively. But when humans were removed from the competition no one realized then what would, and has happened. The programs were quite consistent in their progress, and newer generations would be a little smarter but above all much faster so that progress really became a matter of an extra ply. This consistency in the difference between the opponents is completely different from what happens with humans. Every doubling in strength produces an extra ply or so and as noted an extra 60-70 points, yet with humans that doesn't work that way. Can I simply say that I am calculating an extra ply at every move over my opponent who is rated 60-70 points less? Of course not. But it goes on, what about the opponent rated 130-140 point less? Not even then. I may simply weigh in my positional play and possibly calculate LESS (according to certain positions). Programs for all their phenomenal progress have progressed considerably more in speed and depth than in positional play. What has happened is that the pool has excluded a certain type of competitor, whose weapons are undoubtedly different, and now reflects the differences between only one: the computer. With all due respect, tactics do make up at LEAST 80% of the difference between programs IMO (in the SSDF list), so the ratings reflect the tactical difference for the most part. Programs still aren't capable of using positional play to outweigh tactical insufficiencies like humans, so that outcalculating the opponent is still a solution in computer chess. This only became visible as the ratings hit new peaks because they ran into the ratings reserved to IMs and GMs. These are a different kind of animal altogether. Usually, the biggest difference between human players up to a strength of 2000 is one of tactics, but after it changes, and when you hit IM strength, you will find strategists who can easily make up for very poor tactical play. In computer chess you don't really have that so that the ratings no longer resemble anything like their respectively similar human ratings. Albert Silver > >Bertil SSDF > >> Albert Silver >> >>> so that ELO differences between humans and programs >>>ARE meaningful. It will simply take time for a pool of games to emerge. Then the >>>whole matter can be handled with the rigor of statistical methods. >>> >>>Roger >>> >>> >>> >>> >>>On December 12, 1999 at 08:49:08, Albert Silver wrote: >>> >>>>Hi all, >>>> >>>>As the issue of SSDF ratings, and their comparative value with USCF or FIDE >>>>ratings, has been a recurring theme and a number of threads have sprouted >>>>recently, I thought I'd share my opinion (self-plagiarized) as I think it is >>>>relevant and might shed some light on the matter. >>>> >>>>SSDF ratings: inflated or not? >>>>Here's what I think: the ratings are not inflated in the least bit. >>>>Sounds crazy doesn't it? But it's not. People get too caught up trying to make >>>>these futile comparisons between SSDF ratings and human ratings whether USCF, >>>>FIDE, or whatever. The point is, and it has been repeated very often, there >>>>simply is no comparison. The only comparison possible is that both are generated >>>>using Elo's rating system, but that's where it ends. Elo's system is supposed to >>>>calculate, according to a point system, the probability of success between >>>>opponents rated in that system. The SSDF rating list does that to perfection, >>>>but it is based on the members of the SSDF only. If you put Fritz 5.32 on fast >>>>hardware up against the Tasc R30 or whatnot, it will pulverize the machine. The >>>>difference in SSDF ratings accurately depicts that. It has NOTHING to do with >>>>FIDE or USCF ratings. The rating of Fritz, Hiarcs, or others on the SSDF rating >>>>list depicts their probability of success against other programs on the SSDF >>>>list, and that's it. It doesn't represent their probability of success against >>>>humans because humans simply aren't a part of the testing. If you want to find >>>>out how a program will do against humans then test it against humans, and then >>>>you will find it's rating against them. The SSDF rating has nothing whatsoever >>>>to do with that. As was pointed out, I believe the SSDF ratings pool is a pool >>>>that is COMPLETELY isolated from all others and as such cannot possibly be >>>>compared with them. >>>> >>>> Albert Silver
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.