Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF Rating list

Author: Bertil Eklund

Date: 14:52:54 06/15/01

Go up one level in this thread


On June 15, 2001 at 14:21:04, Chessfun wrote:

>On June 14, 2001 at 05:34:55, Bertil Eklund wrote:
>
>>On June 14, 2001 at 02:49:18, Martin Schubert wrote:
>>
>>>On June 13, 2001 at 18:32:45, Bertil Eklund wrote:
>>>
>>>>On June 13, 2001 at 17:56:08, James T. Walker wrote:
>>>>
>>>>>On June 13, 2001 at 16:14:33, Christophe Theron wrote:
>>>>>
>>>>>>On June 13, 2001 at 11:20:20, James T. Walker wrote:
>>>>>>
>>>>>>>On June 13, 2001 at 00:01:19, Christophe Theron wrote:
>>>>>>>
>>>>>>>>On June 12, 2001 at 22:50:01, James T. Walker wrote:
>>>>>>>>
>>>>>>>>>On June 12, 2001 at 20:54:16, stuart taylor wrote:
>>>>>>>>>
>>>>>>>>>>On June 12, 2001 at 18:41:58, Christophe Theron wrote:
>>>>>>>>>>
>>>>>>>>>>>On June 12, 2001 at 14:48:10, Thoralf Karlsson wrote:
>>>>>>>>>>>
>>>>>>>>>>>>  THE SSDF RATING LIST 2001-06-11   79042 games played by  219 computers
>>>>>>>>>>>>                                           Rating   +     -  Games   Won  Oppo
>>>>>>>>>>>>                                           ------  ---   --- -----   ---  ----
>>>>>>>>>>>>   1 Deep Fritz  128MB K6-2 450 MHz          2653   29   -28   647   64%  2551
>>>>>>>>>>>>   2 Gambit Tiger 2.0  128MB K6-2 450 MHz    2650   43   -40   302   67%  2528
>>>>>>>>>>>>   3 Chess Tiger 14.0 CB 128MB K6-2 450 MHz  2632   43   -40   308   67%  2508
>>>>>>>>>>>>   4 Fritz 6.0  128MB K6-2 450 MHz           2623   23   -23   968   64%  2520
>>>>>>>>>>>>   5 Junior 6.0  128MB K6-2 450 MHz          2596   20   -20  1230   62%  2509
>>>>>>>>>>>>   6 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz 2576   26   -26   733   61%  2499
>>>>>>>>>>>>   7 Fritz 5.32  128MB K6-2 450 MHz          2551   25   -25   804   58%  2496
>>>>>>>>>>>>   8 Nimzo 7.32  128MB K6-2 450 MHz          2550   24   -23   897   58%  2491
>>>>>>>>>>>>   9 Nimzo 8.0  128MB K6-2 450 MHz           2542   28   -28   612   54%  2511
>>>>>>>>>>>>  10 Junior 5.0  128MB K6-2 450 MHz          2534   25   -25   790   58%  2478
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Congratulations to Frans Morsch and Mathias Feist (and the ChessBase team).
>>>>>>>>>>>
>>>>>>>>>>>Deep Fritz is definitely a very tough client. You cannot lead the SSDF list by
>>>>>>>>>>>accident, and leading it for so many years in a row is probably the best
>>>>>>>>>>>achievement of a chess program of all times.
>>>>>>>>>>>
>>>>>>>>>>>If you want to sum up the history of chess programs for microcomputers, I think
>>>>>>>>>>>you just need to remember 3 names:
>>>>>>>>>>>* Richard Lang
>>>>>>>>>>>* Frans Morsch and Mathias Feist
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Christophe
>>>>>>>>>>
>>>>>>>>>>The roarng absence of the name Christophe, appears of course, in the signature
>>>>>>>>>>of the post.
>>>>>>>>>>But I have a little question. Does Deep Fritz have any advantage in the testing
>>>>>>>>>>e.g. the fact that it already stood at the top, long before the recent GT even
>>>>>>>>>>arrived on the scene, and so may have had an advantageous starting point?
>>>>>>>>>>S.Taylor
>>>>>>>>>
>>>>>>>>>Hello Stuart,
>>>>>>>>>I believe that is a valid question.  I would like to know the answer.  I would
>>>>>>>>>like to know if the SSDF "Zeros out" the book learning of say Deep Fritz before
>>>>>>>>>starting a match with Gambit Tiger when Gambit Tiger is brand new?  I still
>>>>>>>>>think the SSDF list is quesionable because of the differences in opponents each
>>>>>>>>>program has to face.  I'm sure it's better than nothing but I sure wouldn't like
>>>>>>>>>to hang my hat on a 3 point difference in SSDF ratings (or even 20 points for
>>>>>>>>>that matter).
>>>>>>>>>Jim
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>I don't question the reliability of the list.
>>>>>>>>
>>>>>>>>It is the most reliable tool that we have to evaluate the chess programs. The
>>>>>>>>difference in the opponents each program has to face does not matter from a
>>>>>>>>mathematical point of view.
>>>>>>>>
>>>>>>>>Year after year we can see that the list is reliable. Almost all objections get
>>>>>>>>refuted, little by little. Of course it is not absolutely perfect, but I think
>>>>>>>>it's damn good.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    Christophe
>>>>>>>
>>>>>>>Hello Christophe,
>>>>>>>I think the thread got sidetracked but I disagree with your assessment of the
>>>>>>>SSDF list.  I agree it's not perfect and it's pretty good but....   I think its
>>>>>>>too easy to make one program come out on top by selecting the number of games
>>>>>>>played vs certain opponents.  If you could play only one opponent and get a true
>>>>>>>rating then there would be no problem.  We all know this is not the case.  Some
>>>>>>>programs do better against certain opponents and worse vs others.  So if you
>>>>>>>play more games vs the opponent you do best against it will inflate your rating.
>>>>>>> Of course the opposite is true.  So if Program "A" plays its favorite opponent
>>>>>>>while program "B" plays it "nemesis" more games then naturally program "A" will
>>>>>>>look better even though they may be equal or even the opposite is true.  This
>>>>>>>becomes very critical when the difference in rating is only a few points in
>>>>>>>reality.  I'm not saying the SSDF does this on purpose but I'm sure they are
>>>>>>>doing nothing to compensate for this possibility.  In my opinion the best way to
>>>>>>>do the SSDF list would be to make all top programs play an equal number of games
>>>>>>>against the same opponents.  That way the top programs would all play the same
>>>>>>>number of games against the same opponents and the list would look like this:
>>>>>>>
>>>>>>>Name         Rating      Number of games
>>>>>>>Program A    2600        400
>>>>>>>Program B    2590        400
>>>>>>>Program C    2580        400
>>>>>>
>>>>>>
>>>>>>
>>>>>>I cannot think of any real evidence that such a phenomenon exist. Can you
>>>>>>mention amongst the top programs which program gets killed by what other
>>>>>>program?
>>>>>>
>>>>>>Has someone statistical evidence of this?
>>>>>>
>>>>>>But anyway, even if all program meet each other, I know some people will say
>>>>>>that there is another way to bias the results: by letting a given program to
>>>>>>enter or not to enter the list you have an influence on the programs it is
>>>>>>supposed to kill.
>>>>>>
>>>>>>It's a neverending story.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    Christophe
>>>>>
>>>>>
>>>>>Hello Christophe,
>>>>>You don't have to get killed or be a killer to change the rating by a few
>>>>>points.  The first program that comes to mind is ChessMaster.  I believe that
>>>>>playing a "Learning" program vs a non-learning program will add rating points to
>>>>>the learning program with more and more games played between them.  If this is
>>>>>not the case then you could just play 500 games vs any opponent you chose and
>>>>>your rating would be just as accurate. In any case this "bias" could be avoided
>>>>>with a little planning.
>>>>>Jim
>>>>
>>>>Ok, and what is wrong now, that favours program x or y?
>>>>
>>>>Bertil
>>>
>>>I doubt that the list favours a program. But I think your idea is to play 40
>>>games in a match, so I wonder why not play exactly 40 games. Sometimes you play
>>>more, sometimes you play less. I don't think it's a big problem playing 39 or 42
>>>games. But it should be no problem playing the same number. Why I would prefer
>>>this is the statistics. The best thing for getting a good statistics for ratings
>>>would be playing a tournament like Cadaques: every program against each other
>>>the same number of games.
>>>
>>>Regards, Martin
>>
>>Hi!
>>
>>Usually we tries to play 40 game matches but from the last list some matches are
>>not finished (11/06 01). In the match Tiger against DF 17-17 or so, Tony
>>received the new Athlon parts and of course he upgraded as soon as he received
>>them! In some case the match could be shorter because of hard or software
>>problems.
>>
>>Bertil
>
>Personally I agree with Christophe the SSDF is the most reliable tool
>that is available for rating a program. Some can critique either method
>or a specific but nothing else compares. It is as near perfect IMO as
>testing a commercial program can be. Anyone who bothers to spend time
>looking through the games knows it.
>
>Sarah.

Hi!

Thanks!  When is your next update of the Chessfun-list? I (we) really looks
forward to it!

Bertil



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.