Author: Roger
Date: 03:29:00 12/12/99
Go up one level in this thread
On December 11, 1999 at 21:40:41, Albert Silver wrote: >On December 11, 1999 at 13:12:01, Roger wrote: > >> >>> >>>So you propose tossing the matches it played against certain opponents? How >>>would you choose which matches to discard? >>> >>> Albert Silver >> >>I don't propose tossing anything. I am just pointing out that the stability of >>Fritz rating as asserted by Enrique is somewhat of an illusion. That doesn't >>mean his position is false, simply that that single fact, cited to support his >>position, might or might not support him if the ratings for Fritz were >>calculated against the rest of the SSDF pool in segments of 250 games (I pick >>250 as an arbitrary large number). >> >>You MIGHT then see a substantial rating decline, and just eyeballing the Fritz >>numbers supplied by Enrique supports this idea. You would have to do the >>calculations, of course, to see how extensive this decline was. >> >>But...even if the ratings were shown to decline over time, that doesn't >>necessary make the SSDF ratings flawed. As we all know, GM players book up >>against each other and study each other's games for flaws, and any GM player >>that doesn't do so will see their rating decline. > >I'm not sure what the reference to booking up against opponents has to do with >this, but here's what I think: the ratings are not inflated in the least bit. >Sounds crazy doesn't it? But it's not. People get too caught up trying to make >these futile comparisons between SSDF ratings and human ratings whether USCF, >FIDE, or whatever. The point is, and it has been repeated very often, there >simply is no comparison. The only comparison possible is that both are generated >using Elo's rating system, but that's where it ends. Elo's system is supposed to >calculate, according to a point system, the probability of success between >opponents rated in that system. The SSDF rating list does that to perfection, >but it is based on the members of the SSDF only. If you put Fritz 5.32 on fast >hardware up against the Tasc R30 or whatnot, it will pulverize the machine. The >difference in SSDF ratings accurately depicts that. It has NOTHING to do with >FIDE or USCF ratings. The rating of Fritz, Hiarcs, or others on the SSDF rating >list depicts their probability of success against other programs on the SSDF >list, and that's it. It doesn't represent their probability of success against >humans because humans aren't a part of the testing. If you want to find out how >a program will do against humans then test it against humans, and then you will >find it's rating against them. The SSDF rating has nothing whatsoever to do with >that. > > Albert Silver > Yeah, if you regard the SSDF as it's own pool, forever isolated from any other pool, then you are absolutely correct. I thought the parties in this dispute were arguing that the SSDF ratings are inflated relative to human players. The pools could someday be intercalibrated, so I think that's a valid comparison. I know that if I could wave a magic wand and make them so, I would because it would be more meaningful. After all, there is no reason why the computer ratings can't be anchored to human ratings through human versus computer games. But if the debate concerns whether the pool ratings are inflated relative to themselves (their true ratings), I'd agree with you. Otherwise, I think it's an empirical question that can be settled by evaluating whether the ratio of wins and losses versus human players is actually predicted by the rating differences between humans and computers. Roger
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.