Author: Bertil Eklund
Date: 14:52:54 06/15/01
Go up one level in this thread
On June 15, 2001 at 14:21:04, Chessfun wrote: >On June 14, 2001 at 05:34:55, Bertil Eklund wrote: > >>On June 14, 2001 at 02:49:18, Martin Schubert wrote: >> >>>On June 13, 2001 at 18:32:45, Bertil Eklund wrote: >>> >>>>On June 13, 2001 at 17:56:08, James T. Walker wrote: >>>> >>>>>On June 13, 2001 at 16:14:33, Christophe Theron wrote: >>>>> >>>>>>On June 13, 2001 at 11:20:20, James T. Walker wrote: >>>>>> >>>>>>>On June 13, 2001 at 00:01:19, Christophe Theron wrote: >>>>>>> >>>>>>>>On June 12, 2001 at 22:50:01, James T. Walker wrote: >>>>>>>> >>>>>>>>>On June 12, 2001 at 20:54:16, stuart taylor wrote: >>>>>>>>> >>>>>>>>>>On June 12, 2001 at 18:41:58, Christophe Theron wrote: >>>>>>>>>> >>>>>>>>>>>On June 12, 2001 at 14:48:10, Thoralf Karlsson wrote: >>>>>>>>>>> >>>>>>>>>>>> THE SSDF RATING LIST 2001-06-11 79042 games played by 219 computers >>>>>>>>>>>> Rating + - Games Won Oppo >>>>>>>>>>>> ------ --- --- ----- --- ---- >>>>>>>>>>>> 1 Deep Fritz 128MB K6-2 450 MHz 2653 29 -28 647 64% 2551 >>>>>>>>>>>> 2 Gambit Tiger 2.0 128MB K6-2 450 MHz 2650 43 -40 302 67% 2528 >>>>>>>>>>>> 3 Chess Tiger 14.0 CB 128MB K6-2 450 MHz 2632 43 -40 308 67% 2508 >>>>>>>>>>>> 4 Fritz 6.0 128MB K6-2 450 MHz 2623 23 -23 968 64% 2520 >>>>>>>>>>>> 5 Junior 6.0 128MB K6-2 450 MHz 2596 20 -20 1230 62% 2509 >>>>>>>>>>>> 6 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz 2576 26 -26 733 61% 2499 >>>>>>>>>>>> 7 Fritz 5.32 128MB K6-2 450 MHz 2551 25 -25 804 58% 2496 >>>>>>>>>>>> 8 Nimzo 7.32 128MB K6-2 450 MHz 2550 24 -23 897 58% 2491 >>>>>>>>>>>> 9 Nimzo 8.0 128MB K6-2 450 MHz 2542 28 -28 612 54% 2511 >>>>>>>>>>>> 10 Junior 5.0 128MB K6-2 450 MHz 2534 25 -25 790 58% 2478 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>Congratulations to Frans Morsch and Mathias Feist (and the ChessBase team). >>>>>>>>>>> >>>>>>>>>>>Deep Fritz is definitely a very tough client. You cannot lead the SSDF list by >>>>>>>>>>>accident, and leading it for so many years in a row is probably the best >>>>>>>>>>>achievement of a chess program of all times. >>>>>>>>>>> >>>>>>>>>>>If you want to sum up the history of chess programs for microcomputers, I think >>>>>>>>>>>you just need to remember 3 names: >>>>>>>>>>>* Richard Lang >>>>>>>>>>>* Frans Morsch and Mathias Feist >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Christophe >>>>>>>>>> >>>>>>>>>>The roarng absence of the name Christophe, appears of course, in the signature >>>>>>>>>>of the post. >>>>>>>>>>But I have a little question. Does Deep Fritz have any advantage in the testing >>>>>>>>>>e.g. the fact that it already stood at the top, long before the recent GT even >>>>>>>>>>arrived on the scene, and so may have had an advantageous starting point? >>>>>>>>>>S.Taylor >>>>>>>>> >>>>>>>>>Hello Stuart, >>>>>>>>>I believe that is a valid question. I would like to know the answer. I would >>>>>>>>>like to know if the SSDF "Zeros out" the book learning of say Deep Fritz before >>>>>>>>>starting a match with Gambit Tiger when Gambit Tiger is brand new? I still >>>>>>>>>think the SSDF list is quesionable because of the differences in opponents each >>>>>>>>>program has to face. I'm sure it's better than nothing but I sure wouldn't like >>>>>>>>>to hang my hat on a 3 point difference in SSDF ratings (or even 20 points for >>>>>>>>>that matter). >>>>>>>>>Jim >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>I don't question the reliability of the list. >>>>>>>> >>>>>>>>It is the most reliable tool that we have to evaluate the chess programs. The >>>>>>>>difference in the opponents each program has to face does not matter from a >>>>>>>>mathematical point of view. >>>>>>>> >>>>>>>>Year after year we can see that the list is reliable. Almost all objections get >>>>>>>>refuted, little by little. Of course it is not absolutely perfect, but I think >>>>>>>>it's damn good. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Christophe >>>>>>> >>>>>>>Hello Christophe, >>>>>>>I think the thread got sidetracked but I disagree with your assessment of the >>>>>>>SSDF list. I agree it's not perfect and it's pretty good but.... I think its >>>>>>>too easy to make one program come out on top by selecting the number of games >>>>>>>played vs certain opponents. If you could play only one opponent and get a true >>>>>>>rating then there would be no problem. We all know this is not the case. Some >>>>>>>programs do better against certain opponents and worse vs others. So if you >>>>>>>play more games vs the opponent you do best against it will inflate your rating. >>>>>>> Of course the opposite is true. So if Program "A" plays its favorite opponent >>>>>>>while program "B" plays it "nemesis" more games then naturally program "A" will >>>>>>>look better even though they may be equal or even the opposite is true. This >>>>>>>becomes very critical when the difference in rating is only a few points in >>>>>>>reality. I'm not saying the SSDF does this on purpose but I'm sure they are >>>>>>>doing nothing to compensate for this possibility. In my opinion the best way to >>>>>>>do the SSDF list would be to make all top programs play an equal number of games >>>>>>>against the same opponents. That way the top programs would all play the same >>>>>>>number of games against the same opponents and the list would look like this: >>>>>>> >>>>>>>Name Rating Number of games >>>>>>>Program A 2600 400 >>>>>>>Program B 2590 400 >>>>>>>Program C 2580 400 >>>>>> >>>>>> >>>>>> >>>>>>I cannot think of any real evidence that such a phenomenon exist. Can you >>>>>>mention amongst the top programs which program gets killed by what other >>>>>>program? >>>>>> >>>>>>Has someone statistical evidence of this? >>>>>> >>>>>>But anyway, even if all program meet each other, I know some people will say >>>>>>that there is another way to bias the results: by letting a given program to >>>>>>enter or not to enter the list you have an influence on the programs it is >>>>>>supposed to kill. >>>>>> >>>>>>It's a neverending story. >>>>>> >>>>>> >>>>>> >>>>>> Christophe >>>>> >>>>> >>>>>Hello Christophe, >>>>>You don't have to get killed or be a killer to change the rating by a few >>>>>points. The first program that comes to mind is ChessMaster. I believe that >>>>>playing a "Learning" program vs a non-learning program will add rating points to >>>>>the learning program with more and more games played between them. If this is >>>>>not the case then you could just play 500 games vs any opponent you chose and >>>>>your rating would be just as accurate. In any case this "bias" could be avoided >>>>>with a little planning. >>>>>Jim >>>> >>>>Ok, and what is wrong now, that favours program x or y? >>>> >>>>Bertil >>> >>>I doubt that the list favours a program. But I think your idea is to play 40 >>>games in a match, so I wonder why not play exactly 40 games. Sometimes you play >>>more, sometimes you play less. I don't think it's a big problem playing 39 or 42 >>>games. But it should be no problem playing the same number. Why I would prefer >>>this is the statistics. The best thing for getting a good statistics for ratings >>>would be playing a tournament like Cadaques: every program against each other >>>the same number of games. >>> >>>Regards, Martin >> >>Hi! >> >>Usually we tries to play 40 game matches but from the last list some matches are >>not finished (11/06 01). In the match Tiger against DF 17-17 or so, Tony >>received the new Athlon parts and of course he upgraded as soon as he received >>them! In some case the match could be shorter because of hard or software >>problems. >> >>Bertil > >Personally I agree with Christophe the SSDF is the most reliable tool >that is available for rating a program. Some can critique either method >or a specific but nothing else compares. It is as near perfect IMO as >testing a commercial program can be. Anyone who bothers to spend time >looking through the games knows it. > >Sarah. Hi! Thanks! When is your next update of the Chessfun-list? I (we) really looks forward to it! Bertil
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.