Author: Peter Fendrich
Date: 03:45:49 05/27/98
Go up one level in this thread
I have deliberately been quiet for a long time about this subject. It has been hard but now I can't shut up any more... :) I read myriads of misunderstandings about how the SSDF rating is done and under what conditions it is produced and what it means. What is really meassured? ========================= It is NOT about Fritz5 against Rebel9, Genius5 against Nimzo98 etc. Rebel9 on P90 is a completely different player from Rebel9 on P200 and that holds for all programs of course in addition to this we have all types of configurations possible for each computer type whih is almost impossible to cover. Different P200 with the same configuration sometimes have completely different performances. We have to live with that. It's a fact that the programs have different performances on different hardware relative to each other. The SSDF list shows the strength within the pool of chess programs. It doesn't show the strength against humans. I'm convinced of that there is a good overlap between these pools but it is not the same in any mean. What are the current conditions? =============================== The most important guideline for this kind of testing is to play against as many opponents as possible. SSDF tries to do that. However it doesn't make sense to play games between opponents with a very big rating difference. There is a limited set of testers with a limited set of computers with different hardware. That implies that there has to be some kind of prioritization of what to test and when. The most natural choice is to let the latest programs get the latest hardware and that's what the SSDF testers does. Soon there will be faster hardware and more memory available and someone has to be first... Logistic or technical problems have an impact on these guidelines. What about the accuracy of the ratings? ======================================= There are two different issues here. First if the ratings within the list are accurate. I would say that this is most accurate rating list evere seen in this respect. Much more reliable than any other ELO list for GM's and other chess players. So the rating *difference* between two programs on exactly that hardware is very reliable. The other issue is if the ratings themself are accurate. That is very hard to say. The only way to know is to play a lot of real tournament games against human opponents fighting there best. We don't have much of these. The Aegon tournaments are maybe the best we have and the results there comply very well to the ratings on SSDF's lists. If there is an inflation of ratings I wouldn't be very surprised however, but it wouldn't upset me either... If it's say 50 points too high, just subtract 50 points from each rating. There are a lot of games on ICC and FICS. The GM's (as well as others) playing there are kind of specialized on chess computers and they probably does better than the GM group as a whole agianst computers. The conditions in general are somewhat unreliable for rating. More controlled games under real tournament conditions between computers and humans would be great! And.... ======= This is not an official statement from SSDF and I haven't talked to Thoralf about this message. I am not myself part of the testing process any more but was in the very beginning the "chief designer" of how the ratings and confidence levels should be computed and what guidelines to set on the test procedures. Especially the method of how to take advantage of the fact that computer A on hardware X will always have the same strength. It will not vary over time as humans do. Look at: http://home3.swipnet.se/~w-36794/ssdf/ to know more abou this. There is a FAQ. //Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.