Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF ratings are 100% accurate

Author: Albert Silver

Date: 04:22:28 12/14/99

Go up one level in this thread


On December 13, 1999 at 07:53:26, Bertil Eklund wrote:

>On December 13, 1999 at 06:29:21, Albert Silver wrote:
>
>>On December 12, 1999 at 17:18:46, Roger wrote:
>>
>>>Yep, the SSDF is a pool unto itself, and as such, its ratings can't be compared
>>>to those of humans.
>>>
>>>The problem in saying that the ratings are 100% ACCURATE is: Accurate compared
>>>to what? If the ratings are what they are, that is, if the pool is 100%
>>>isolated, then the statement is tautological: The SSDF ratings are accurate
>>>compared to themselves. Not very exciting.
>>
>>Not sure what you mean.
>>
>>>
>>>So...my opinion is that statements about the accuracy of the ratings must refer
>>>to some external source of validation, in other words, some reference point
>>>outside the pool itself.
>>
>>Why?
>>
>>>
>>>And that, of course, would be human ratings.
>>
>>I don't understand how adding games against humans will make a rating system
>>that calculates how computers do against other computers more precise. If
>>anything it will make it valueless.
>>
>>>
>>>So, as more IM and GM versus computer games emerge, the SSDF ratings can
>>>eventually be recalibrated
>>
>>Recalibrated? You are assuming they are supposed to be connected. SSDF ratings
>>calculate computer versus computer ratings. If you change the pool, you change
>>what they are calculating, not making it more precise. Where is it imprecise?
>
>Hi!
>
>Of course they are (was) connected, once upon a time (1993) the list was
>connected with about 300 games against humans. The level of the list is still
>based on those games.

I know. Back when I was just learning, my interest in computers playing the game
had already been piqued, and I enjoyed watching the strong players wrestle
against the wooden wonder, the Mephisto Roma, etc... And Pierre Nolot, a close
friend of the shop management, used to bring in the latest news including the
SSDF ratings. It was remarked on that the ratings of the Par Excellence and the
new phenomenon, the Mach III, were remarkably close to the SSDF ratings. Both
French ratings had been obtained by playing against some 40 opponents at the
rhythm of 40/2h, and the Par Excellence was situated at 1850 while the Mach III
was considered the first to break the then mythical barrier of 2000 Elo with a
rating of 2036. Both ratings were very close to the SSDF ratings which had them
listed at 1835 and 1993 respectively. But when humans were removed from the
competition no one realized then what would, and has happened. The programs were
quite consistent in their progress, and newer generations would be a little
smarter but above all much faster so that progress really became a matter of an
extra ply. This consistency in the difference between the opponents is
completely different from what happens with humans. Every doubling in strength
produces an extra ply or so and as noted an extra 60-70 points, yet with humans
that doesn't work that way. Can I simply say that I am calculating an extra ply
at every move over my opponent who is rated 60-70 points less? Of course not.
But it goes on, what about the opponent rated 130-140 point less? Not even then.
I may simply weigh in my positional play and possibly calculate LESS (according
to certain positions). Programs for all their phenomenal progress have
progressed considerably more in speed and depth than in positional play. What
has happened is that the pool has excluded a certain type of competitor, whose
weapons are undoubtedly different, and now reflects the differences between only
one: the computer. With all due respect, tactics do make up at LEAST 80% of the
difference between programs IMO (in the SSDF list), so the ratings reflect the
tactical difference for the most part. Programs still aren't capable of using
positional play to outweigh tactical insufficiencies like humans, so that
outcalculating the opponent is still a solution in computer chess. This only
became visible as the ratings hit new peaks because they ran into the ratings
reserved to IMs and GMs. These are a different kind of animal altogether.
Usually, the biggest difference between human players up to a strength of 2000
is one of tactics, but after it changes, and when you hit IM strength, you will
find strategists who can easily make up for very poor tactical play. In computer
chess you don't really have that so that the ratings no longer resemble anything
like their respectively similar human ratings.

                                        Albert Silver

>
>Bertil SSDF
>
>>                                  Albert Silver
>>
>>> so that ELO differences between humans and programs
>>>ARE meaningful. It will simply take time for a pool of games to emerge. Then the
>>>whole matter can be handled with the rigor of statistical methods.
>>>
>>>Roger
>>>
>>>
>>>
>>>
>>>On December 12, 1999 at 08:49:08, Albert Silver wrote:
>>>
>>>>Hi all,
>>>>
>>>>As the issue of SSDF ratings, and their comparative value with USCF or FIDE
>>>>ratings, has been a recurring theme and a number of threads have sprouted
>>>>recently, I thought I'd share my opinion (self-plagiarized) as I think it is
>>>>relevant and might shed some light on the matter.
>>>>
>>>>SSDF ratings: inflated or not?
>>>>Here's what I think: the ratings are not inflated in the least bit.
>>>>Sounds crazy doesn't it? But it's not. People get too caught up trying to make
>>>>these futile comparisons between SSDF ratings and human ratings whether USCF,
>>>>FIDE, or whatever. The point is, and it has been repeated very often, there
>>>>simply is no comparison. The only comparison possible is that both are generated
>>>>using Elo's rating system, but that's where it ends. Elo's system is supposed to
>>>>calculate, according to a point system, the probability of success between
>>>>opponents rated in that system. The SSDF rating list does that to perfection,
>>>>but it is based on the members of the SSDF only. If you put Fritz 5.32 on fast
>>>>hardware up against the Tasc R30 or whatnot, it will pulverize the machine. The
>>>>difference in SSDF ratings accurately depicts that. It has NOTHING to do with
>>>>FIDE or USCF ratings. The rating of Fritz, Hiarcs, or others on the SSDF rating
>>>>list depicts their probability of success against other programs on the SSDF
>>>>list, and that's it. It doesn't represent their probability of success against
>>>>humans because humans simply aren't a part of the testing. If you want to find
>>>>out how a program will do against humans then test it against humans, and then
>>>>you will find it's rating against them. The SSDF rating has nothing whatsoever
>>>>to do with that. As was pointed out, I believe the SSDF ratings pool is a pool
>>>>that is COMPLETELY isolated from all others and as such cannot possibly be
>>>>compared with them.
>>>>
>>>>                                    Albert Silver



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.