Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF Rating list

Author: James T. Walker

Date: 08:20:20 06/13/01

Go up one level in this thread


On June 13, 2001 at 00:01:19, Christophe Theron wrote:

>On June 12, 2001 at 22:50:01, James T. Walker wrote:
>
>>On June 12, 2001 at 20:54:16, stuart taylor wrote:
>>
>>>On June 12, 2001 at 18:41:58, Christophe Theron wrote:
>>>
>>>>On June 12, 2001 at 14:48:10, Thoralf Karlsson wrote:
>>>>
>>>>>  THE SSDF RATING LIST 2001-06-11   79042 games played by  219 computers
>>>>>                                           Rating   +     -  Games   Won  Oppo
>>>>>                                           ------  ---   --- -----   ---  ----
>>>>>   1 Deep Fritz  128MB K6-2 450 MHz          2653   29   -28   647   64%  2551
>>>>>   2 Gambit Tiger 2.0  128MB K6-2 450 MHz    2650   43   -40   302   67%  2528
>>>>>   3 Chess Tiger 14.0 CB 128MB K6-2 450 MHz  2632   43   -40   308   67%  2508
>>>>>   4 Fritz 6.0  128MB K6-2 450 MHz           2623   23   -23   968   64%  2520
>>>>>   5 Junior 6.0  128MB K6-2 450 MHz          2596   20   -20  1230   62%  2509
>>>>>   6 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz 2576   26   -26   733   61%  2499
>>>>>   7 Fritz 5.32  128MB K6-2 450 MHz          2551   25   -25   804   58%  2496
>>>>>   8 Nimzo 7.32  128MB K6-2 450 MHz          2550   24   -23   897   58%  2491
>>>>>   9 Nimzo 8.0  128MB K6-2 450 MHz           2542   28   -28   612   54%  2511
>>>>>  10 Junior 5.0  128MB K6-2 450 MHz          2534   25   -25   790   58%  2478
>>>>
>>>>
>>>>
>>>>Congratulations to Frans Morsch and Mathias Feist (and the ChessBase team).
>>>>
>>>>Deep Fritz is definitely a very tough client. You cannot lead the SSDF list by
>>>>accident, and leading it for so many years in a row is probably the best
>>>>achievement of a chess program of all times.
>>>>
>>>>If you want to sum up the history of chess programs for microcomputers, I think
>>>>you just need to remember 3 names:
>>>>* Richard Lang
>>>>* Frans Morsch and Mathias Feist
>>>>
>>>>
>>>>
>>>>    Christophe
>>>
>>>The roarng absence of the name Christophe, appears of course, in the signature
>>>of the post.
>>>But I have a little question. Does Deep Fritz have any advantage in the testing
>>>e.g. the fact that it already stood at the top, long before the recent GT even
>>>arrived on the scene, and so may have had an advantageous starting point?
>>>S.Taylor
>>
>>Hello Stuart,
>>I believe that is a valid question.  I would like to know the answer.  I would
>>like to know if the SSDF "Zeros out" the book learning of say Deep Fritz before
>>starting a match with Gambit Tiger when Gambit Tiger is brand new?  I still
>>think the SSDF list is quesionable because of the differences in opponents each
>>program has to face.  I'm sure it's better than nothing but I sure wouldn't like
>>to hang my hat on a 3 point difference in SSDF ratings (or even 20 points for
>>that matter).
>>Jim
>
>
>
>I don't question the reliability of the list.
>
>It is the most reliable tool that we have to evaluate the chess programs. The
>difference in the opponents each program has to face does not matter from a
>mathematical point of view.
>
>Year after year we can see that the list is reliable. Almost all objections get
>refuted, little by little. Of course it is not absolutely perfect, but I think
>it's damn good.
>
>
>
>    Christophe

Hello Christophe,
I think the thread got sidetracked but I disagree with your assessment of the
SSDF list.  I agree it's not perfect and it's pretty good but....   I think its
too easy to make one program come out on top by selecting the number of games
played vs certain opponents.  If you could play only one opponent and get a true
rating then there would be no problem.  We all know this is not the case.  Some
programs do better against certain opponents and worse vs others.  So if you
play more games vs the opponent you do best against it will inflate your rating.
 Of course the opposite is true.  So if Program "A" plays its favorite opponent
while program "B" plays it "nemesis" more games then naturally program "A" will
look better even though they may be equal or even the opposite is true.  This
becomes very critical when the difference in rating is only a few points in
reality.  I'm not saying the SSDF does this on purpose but I'm sure they are
doing nothing to compensate for this possibility.  In my opinion the best way to
do the SSDF list would be to make all top programs play an equal number of games
against the same opponents.  That way the top programs would all play the same
number of games against the same opponents and the list would look like this:

Name         Rating      Number of games
Program A    2600        400
Program B    2590        400
Program C    2580        400






This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.