Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: About Fairness and Progress (it was Thorsten has a point)

Author: Peter Fendrich

Date: 03:45:49 05/27/98

Go up one level in this thread


I have deliberately been quiet for a long time about this subject. It
has been hard but now I can't shut up any more... :)
I read myriads of misunderstandings about how the SSDF rating is done
and under what conditions it is produced and what it means.

What is really meassured?
=========================
It is  NOT about Fritz5 against Rebel9, Genius5 against Nimzo98 etc.
Rebel9 on P90 is a completely different player from Rebel9 on P200 and
that holds for all programs of course in addition to this we have all
types of configurations possible for each computer type whih is almost
impossible to cover. Different P200 with the same configuration
sometimes have completely different performances. We have to live with
that.

It's a fact that the programs have different performances on different
hardware relative to each other.
The SSDF list shows the strength within the pool of chess programs. It
doesn't show the strength against humans. I'm convinced of that there is
a good overlap between these pools but it is not the same in any mean.

What are the current conditions?
===============================
The most important guideline for this kind of testing is to play against
as many opponents as possible. SSDF tries to do that. However it doesn't
make sense to play games between opponents with a very big rating
difference.

There is a limited set of testers with a limited set of computers with
different hardware. That implies that there has to be some kind of
prioritization of what to test and when.
The most natural choice is to let the latest programs get the latest
hardware and that's what the SSDF testers does. Soon there will be
faster hardware and more memory available and someone has to be first...

Logistic or technical problems have an impact on these guidelines.

What about the accuracy of the ratings?
=======================================
There are two different issues here. First if the ratings within the
list are accurate. I would say that this is most accurate rating list
evere seen in this respect. Much more reliable than any other ELO list
for GM's and other chess players. So the rating *difference* between two
programs on exactly that hardware is very reliable.
The other issue is if the ratings themself are accurate. That is very
hard to say. The only way to know is to play a lot of real tournament
games against human opponents fighting there best. We don't have much of
these.
The Aegon tournaments are maybe the best we have and the results there
comply very well to the ratings on SSDF's lists. If there is an
inflation of ratings I wouldn't be very surprised however, but it
wouldn't upset me either...
If it's say 50 points too high, just subtract 50 points from each
rating.

There are a lot of games on ICC and FICS.
The GM's (as well as others) playing there are kind of specialized on
chess computers and they probably does better than the GM group as a
whole agianst computers. The conditions in general are somewhat
unreliable for rating.

More controlled games under real tournament conditions between computers
and humans would be great!

And....
=======
This is not an official statement from SSDF and I haven't talked to
Thoralf about this message. I am not myself part of the testing process
any more but was in the very beginning the "chief designer" of how the
ratings and confidence levels should be computed and what guidelines to
set on the test procedures. Especially the method of how to take
advantage of the fact that computer A on hardware X will always have the
same strength. It will not vary over time as humans do.

Look at: http://home3.swipnet.se/~w-36794/ssdf/
to know more abou this. There is a FAQ.

//Peter
















This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.