Author: Ed Schröder
Date: 16:48:40 03/05/04
Go up one level in this thread
On March 05, 2004 at 14:24:14, Vincent Diepeveen wrote: >here an email i had forwarded from uni to home: > >Another thing here. Please show me 1 occasion where SSDF harmed Fritz. I can >show you for about every other program problems that occured with it in one or >another way, but never for Fritz. This is all old stuff, all solved. Ed >------------------------------- > >Received: from smtp3.xs4all.nl (smtp3.xs4all.nl [194.109.6.53]) > by maildrop.xs4all.nl (8.8.8/8.8.8) with ESMTP id LAA21875 > for <diep@xs4all.nl>; Mon, 29 Mar 1999 11:27:16 +0200 (CEST) >Received: from mail.students.cs.uu.nl (vmailer@solar.students.cs.uu.nl >[131.211.82.28]) > by smtp3.xs4all.nl (8.8.8/8.8.8) with ESMTP id LAA05460 > for <diep@xs4all.nl>; Mon, 29 Mar 1999 11:27:16 +0200 (CEST) >Received: by mail.students.cs.uu.nl (Postfix) > id B4A3785E3; Mon, 29 Mar 1999 11:27:15 +0200 (MET DST) >Delivered-To: vdiepeve@students.cs.ruu.nl >Received: from smtp1.xs4all.nl (smtp1.xs4all.nl [194.109.6.51]) > by mail.students.cs.uu.nl (Postfix) with ESMTP id 2D6FC85E2 > for <vdiepeve@students.cs.ruu.nl>; Mon, 29 Mar 1999 11:27:15 +0200 (MET >DST) >Received: from schroder (dc2-modem2215.dial.xs4all.nl [194.109.136.167]) > by smtp1.xs4all.nl (8.8.8/8.8.8) with SMTP id LAA21809; > Mon, 29 Mar 1999 11:23:31 +0200 (CEST) >Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl> >Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl> >Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl> >X-Sender: schroder@mail.xs4all.nl >X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0 >Date: Mon, 29 Mar 1999 11:24:12 +0200 >To: "Thoralf Karlsson" ><and a huge CC list> >From: Ed Schroder <schroder@xs4all.nl> >Subject: Re: New rating list >In-Reply-To: <000f01be7945$4203c440$6b6bf482@default> >Mime-Version: 1.0 >Content-Type: text/plain; charset="iso-8859-1" >Content-Transfer-Encoding: 8bit >X-MIME-Autoconverted: from quoted-printable to 8bit by maildrop.xs4all.nl id >LAA21875 >X-UIDL: 922699637.maildrop.21877 > > >> >> 1 Chessmaster 6000 64MB P200 MMX 2576 88 -71 100 78% >> 2363 > > >My warm congrats to Johan in the first place! > > >> >> The SSDF rating list is made with the ambition >> to provide reliable information about the playing >> strength of chess programs and chess computers. >> >> The ideal way to reach this goal would be to play >> hundreds of tournament games with each program >> against humans in the same country or against >> international ELO-rated players. Unfortunately >> this is an impossible task. >> >> We have chosen to play matches with 20-40 games >> between a new program and as many as possible of >> the earlier tested programs. The goal is to get >> about 600 - 800 tournament games for each program. >> This gives us a relative rating, which with >> 95% probability is within a range of +-25-30 points. >> >> Hopefully this rating could roughly be achieved if >> the same amount of games were to be played against >> human players. But we cannot be sure about that. The >> level of the rating list is based on the results of >> 337 games against Swedish chess players, played in >> serious tournaments between 1987 and 1991. The chess >> computers then used were of course much weaker than >> todays top programs running on fast hardware. >> >> Many things have to be considered when the ratings >> are interpreted. Is there a difference between the >> Swedish rating level and the level in other countries? >> It seems so. How does the Swedish level correlate >> to the level in the ELO-system? >> >> Are some chess programs relatively better or worse >> against humans than other programs? It could be so. >> But in order to prove that, you would have to play >> hundreds of games against humans with several of >> the chess programs, and this has not been done. >> >> Is an improvement with say 400 rating points on >> the SSDF rating list comparable with the same >> improvement against humans? Or could there be a >> "spreading" effect when games between computers >> are played? >> >> The results in the Aegon-tournament 1992 - 1997, >> where up to 300 games against humans were >> played each year, showed ratings slightly above >> what could be expected from the rating list. This >> indicates that the level of the list is rather >> correct, but doesn't prove it. The playing level >> in Aegon was somewhat faster than tournament play, >> and the humans didn't have Swedish ratings. > > > TOP 12 FOR THE MAIN-COMPUTER PROGRAMMES > 1991-1997 > ---------------------------------------------------------------- > 1. Rebel +21 = 8 - 7 25/36 = 69.4% > 2. Chess Genius +15 = 9 - 6 19½/30 = 65.0% > 3. Chessica (Fritz) + 9 = 5 - 4 11½/18 = 63.8% > 4. Hiarcs +16 = 6 - 8 19/30 = 63.3% > 5. The King (Chessmaster) +22 =14 -12 29/48 = 60.4% > 6. M-Chess Pro +26 = 5 -17 28½/48 = 59.3% > 7. Chessmaster (The King) + 8 = 5 - 5 10½/18 = 58.3% > 8. Virtual Chess + 9 = 3 - 6 10½/18 = 58.3% > 9. Fritz +14 =13 - 9 20½/36 = 56.9% > 10. Quest (Fritz) +15 =11 -10 20½/36 = 56.9% > 11. Kallisto +17 = 6 -13 20/36 = 55.5% > 12. Nimzo +12 = 6 -12 15/30 = 50.0% > ---------------------------------------------------------------- > >This AEGON statistic more or less implies the opposite that you can't compare >comp-comp with human-comp. > > >> >> More strictly speaking, the SSDF rating list >> provides information about the relative ratings >> for chess programs, when they are tested in the >> way SSDF has chosen. > > >You provide comp-comp information only. > > >> >> >> Could the result be different with another testing >> method? Yes, it could. We are presenting results >> on the tournament level, 40 moves in two hours. >> Games played on the blitz level or with one minute >> per move, would probably give a different order of >> the programs. >> >> The same would most likely happen if we had used >> a faster or different processor. Some programs >> run relatively better on a AMD-processor with >> 64 kB level-1 cache than on P200 MMX. >> >> Programs with good learners benefit from long >> series of games against the same opponent. If >> we somehow automatically could change the oppo- >> nent after each game, the result might differ. >> >> Obviously, the SSDF rating list doesn't say any- >> thing about programs which haven't been tested >> by us. Our ambition is to include all of the >> strongest commercially available programs. Some- >> times we also have the possibility to play with >> amateur programs. >> >> At the moment some of the stronger programs are >> missing because of juridical (perhaps illegal?) >> threats. Another reason for programs not being >> included is the fact that they cannot be tested >> automatically with auto232. The reason being >> that most testers are not willing to go back >> to the time consuming manual testing. > > >If you start testing programs in a fair way (that is admit and correct when >mistakes are made) some producers may reconsider their policy. > >SSDF Rebel9 testing............. > >#1. Over 100 games were played without the main-book loaded. As a result >Rebel9 >played with a book of just 4.5 Kb instead of the main-book of 400 Kb. > >#2. You did not test Rebel9 on its strongest settings. In more than 100 games >Rebel9 played without its learner. I stopped counting after 100 games. > > >> >> Thanks to four of SSDFs testers, we can now >> include one of these programs without auto232, >> which exactly reached the minimum number of >> games to be accepted. It is Johan de Konings >> Chessmaster 6000 P200 MMX, which after one >> hundred games takes a shared first place on >> the list with 2576!! >> >> Compared to the earlier program version, CM5000, >> which we tested on P90, the new program seems >> to be about 110 points stronger! I can not deny >> that the result surprised me somewhat. Still, >> the rating is based on only 100 games with a >> margin of error of 70-80 points, so much could >> change if we succeed in playing more games. >> >> Fritz 5.32 has gone up one point, which gives it >> a shared first place together with CM6000. >> Hiarcs 7.0 P200 MMX has lost 9 points and now >> takes the third place. >> >> Chrilly Donningers program Nimzo 99 P200 MMX is new >> on the list. After 390 games it has a rating of >> 2565, which gives it a fourth place. Compared to >> Nimzo 98, the rating has gone up with 42 points! >> >> Most of the games are played with a program >> version dated 98-12-14, the rest with a version >> from 98-11-01. The possibility to tell Nimzo 99 >> if it plays against Rebel, Junior or Fritz has >> been used in one or two of the matches. > > >Also unfair. > >Amazing you have included these results. > >Next step (similar telling Nimzo99 its opponent) is to allow "opponent >specific >book" made out of won autoplay games and load them. > >Please explain what this has to do with your above statement? > >[ BEGIN ] > >We have chosen to play matches with 20-40 games >between a new program and as many as possible of >the earlier tested programs. The goal is to get >about 600 - 800 tournament games for each program. >This gives us a relative rating, which with >95% probability is within a range of +-25-30 points. > >[ END ] > >Ed Schroder > > >> >> >> Junior 5 has lost 11 points compared to the >> latest list, and Atlanta has gained 9 points. >> >> For the nearest future we intend to play more games >> with Nimzo 99 and CM6000 and probably start >> with Crafty. As soon as we decide about the next >> hardware level, the best programs will be >> retested on the faster processor. >> >> Next list will appear in May. >> >> Thoralf Karlsson > >---------------------------------
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.