Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: More old postings

Author: Ed Schröder
Date: 16:48:40 03/05/04
On March 05, 2004 at 14:24:14, Vincent Diepeveen wrote:

>here an email i had forwarded from uni to home:
>
>Another thing here. Please show me 1 occasion where SSDF harmed Fritz. I can
>show you for about every other program problems that occured with it in one or
>another way, but never for Fritz.

This is all old stuff, all solved.

Ed


>-------------------------------
>
>Received: from smtp3.xs4all.nl (smtp3.xs4all.nl [194.109.6.53])
>        by maildrop.xs4all.nl (8.8.8/8.8.8) with ESMTP id LAA21875
>        for <diep@xs4all.nl>; Mon, 29 Mar 1999 11:27:16 +0200 (CEST)
>Received: from mail.students.cs.uu.nl (vmailer@solar.students.cs.uu.nl
>[131.211.82.28])
>        by smtp3.xs4all.nl (8.8.8/8.8.8) with ESMTP id LAA05460
>        for <diep@xs4all.nl>; Mon, 29 Mar 1999 11:27:16 +0200 (CEST)
>Received: by mail.students.cs.uu.nl (Postfix)
>        id B4A3785E3; Mon, 29 Mar 1999 11:27:15 +0200 (MET DST)
>Delivered-To: vdiepeve@students.cs.ruu.nl
>Received: from smtp1.xs4all.nl (smtp1.xs4all.nl [194.109.6.51])
>        by mail.students.cs.uu.nl (Postfix) with ESMTP id 2D6FC85E2
>        for <vdiepeve@students.cs.ruu.nl>; Mon, 29 Mar 1999 11:27:15 +0200 (MET
>DST)
>Received: from schroder (dc2-modem2215.dial.xs4all.nl [194.109.136.167])
>        by smtp1.xs4all.nl (8.8.8/8.8.8) with SMTP id LAA21809;
>        Mon, 29 Mar 1999 11:23:31 +0200 (CEST)
>Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl>
>Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl>
>Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl>
>X-Sender: schroder@mail.xs4all.nl
>X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0
>Date: Mon, 29 Mar 1999 11:24:12 +0200
>To: "Thoralf Karlsson"
><and a huge CC list>
>From: Ed Schroder <schroder@xs4all.nl>
>Subject: Re: New rating list
>In-Reply-To: <000f01be7945$4203c440$6b6bf482@default>
>Mime-Version: 1.0
>Content-Type: text/plain; charset="iso-8859-1"
>Content-Transfer-Encoding: 8bit
>X-MIME-Autoconverted: from quoted-printable to 8bit by maildrop.xs4all.nl id
>LAA21875
>X-UIDL: 922699637.maildrop.21877
>
>
>>
>>    1 Chessmaster 6000  64MB P200 MMX         2576   88   -71   100   78% 
>> 2363
>
>
>My warm congrats to Johan in the first place!
>
>
>>
>> The SSDF rating list is made with the ambition
>> to provide reliable information about the playing
>> strength of chess programs and chess computers.
>>  
>> The ideal way to reach this goal would be to play
>> hundreds of tournament games with each program
>> against humans in the same country or against
>> international ELO-rated players. Unfortunately
>> this is an impossible task.
>>  
>> We have chosen to play matches with 20-40 games
>> between a new program and as many as possible of
>> the earlier tested programs. The goal is to get
>> about 600 - 800 tournament games for each program.
>> This gives us a relative rating, which with
>> 95% probability is within a range of +-25-30 points.
>>  
>> Hopefully this rating could roughly be achieved if
>> the same amount of games were to be played against
>> human players. But we cannot be sure about that. The
>> level of the rating list is based on the results of
>> 337 games against Swedish chess players, played in
>> serious tournaments between 1987 and 1991. The chess
>> computers then used were of course much weaker than
>> todays top programs running on fast hardware.
>>  
>> Many things have to be considered when the ratings
>> are interpreted. Is there a difference between the
>> Swedish rating level and the level in other countries?
>> It seems so. How does the Swedish level correlate
>> to the level in the ELO-system?
>>  
>> Are some chess programs relatively better or worse
>> against humans than other programs? It could be so.
>> But in order to prove that, you would have to play
>> hundreds of games against humans with several of
>> the chess programs, and this has not been done.
>>  
>> Is an improvement with say 400 rating points on
>> the SSDF rating list comparable with the same
>> improvement against humans? Or could there be a
>> "spreading" effect when games between computers
>> are played?
>>  
>> The results in the Aegon-tournament 1992 - 1997,
>> where up to 300 games against humans were
>> played each year, showed ratings slightly above
>> what could be expected from the rating list. This
>> indicates that the level of the list is rather
>> correct, but doesn't prove it. The playing level
>> in Aegon was somewhat faster than tournament play,
>> and the humans didn't have Swedish ratings.
>
>
>            TOP 12 FOR THE MAIN-COMPUTER PROGRAMMES
>                         1991-1997
>  ----------------------------------------------------------------
>   1. Rebel                  +21    = 8    - 7     25/36   = 69.4%
>   2. Chess Genius           +15    = 9    - 6    19½/30   = 65.0%
>   3. Chessica (Fritz)       + 9    = 5    - 4    11½/18   = 63.8%
>   4. Hiarcs                 +16    = 6    - 8     19/30   = 63.3%
>   5. The King (Chessmaster) +22    =14    -12     29/48   = 60.4%
>   6. M-Chess Pro            +26    = 5    -17    28½/48   = 59.3%
>   7. Chessmaster (The King) + 8    = 5    - 5    10½/18   = 58.3%
>   8. Virtual Chess          + 9    = 3    - 6    10½/18   = 58.3%
>   9. Fritz                  +14    =13    - 9    20½/36   = 56.9%
>  10. Quest (Fritz)          +15    =11    -10    20½/36   = 56.9%
>  11. Kallisto               +17    = 6    -13     20/36   = 55.5%
>  12. Nimzo                  +12    = 6    -12     15/30   = 50.0%
>  ----------------------------------------------------------------
>
>This AEGON statistic more or less implies the opposite that you can't compare
>comp-comp with human-comp.
>
>
>>
>> More strictly speaking, the SSDF rating list
>> provides information about the relative ratings
>> for chess programs, when they are tested in the
>> way SSDF has chosen.
>
>
>You provide comp-comp information only.
>
>
>>
>>  
>> Could the result be different with another testing
>> method? Yes, it could. We are presenting results
>> on the tournament level, 40 moves in two hours.
>> Games played on the blitz level or with one minute
>> per move, would probably give a different order of
>> the programs.
>>  
>> The same would most likely happen if we had used
>> a faster or different processor. Some programs
>> run relatively better on a AMD-processor with
>> 64 kB level-1 cache than on P200 MMX.
>>  
>> Programs with good learners benefit from long
>> series of games against the same opponent. If
>> we somehow automatically could change the oppo-
>> nent after each game, the result might differ.
>>  
>> Obviously, the SSDF rating list doesn't say any-
>> thing about programs which haven't been tested
>> by us. Our ambition is to include all of the
>> strongest commercially available programs. Some-
>> times we also have the possibility to play with
>> amateur programs.
>>  
>> At the moment some of the stronger programs are
>> missing because of juridical (perhaps illegal?)
>> threats. Another reason for programs not being
>> included is the fact that they cannot be tested
>> automatically with auto232. The reason being
>> that most testers are not willing to go back
>> to the time consuming manual testing.
>
>
>If you start testing programs in a fair way (that is admit and correct when
>mistakes are made) some producers may reconsider their policy.
>
>SSDF Rebel9 testing.............
>
>#1. Over 100 games were played without the main-book loaded. As a result
>Rebel9
>played with a book of just 4.5 Kb instead of the main-book of 400 Kb.
>
>#2. You did not test Rebel9 on its strongest settings. In more than 100 games
>Rebel9 played without its learner. I stopped counting after 100 games.
>
>
>>
>> Thanks to four of SSDFs testers, we can now
>> include one of these programs without auto232,
>> which exactly reached the minimum number of
>> games to be accepted. It is Johan de Konings
>> Chessmaster 6000 P200 MMX, which after one
>> hundred games takes a shared first place on
>> the list with 2576!!
>>  
>> Compared to the earlier program version, CM5000,
>> which we tested on P90, the new program seems
>> to be about 110 points stronger! I can not deny
>> that the result surprised me somewhat. Still,
>> the rating is based on only 100 games with a
>> margin of error of 70-80 points, so much could
>> change if we succeed in playing more games.
>>  
>> Fritz 5.32 has gone up one point, which gives it
>> a shared first place together with CM6000.
>> Hiarcs 7.0 P200 MMX has lost 9 points and now
>> takes the third place.
>>  
>> Chrilly Donningers program Nimzo 99 P200 MMX is new
>> on the list. After 390 games it has a rating of
>> 2565, which gives it a fourth place. Compared to
>> Nimzo 98, the rating has gone up with 42 points!
>>  
>> Most of the games are played with a program
>> version dated 98-12-14, the rest with a version
>> from 98-11-01. The possibility to tell Nimzo 99
>> if it plays against Rebel, Junior or Fritz has
>> been used in one or two of the matches.
>
>
>Also unfair.
>
>Amazing you have included these results.
>
>Next step (similar telling Nimzo99 its opponent) is to allow "opponent
>specific
>book" made out of won autoplay games and load them.
>
>Please explain what this has to do with your above statement?
>
>[ BEGIN ]
>
>We have chosen to play matches with 20-40 games
>between a new program and as many as possible of
>the earlier tested programs. The goal is to get
>about 600 - 800 tournament games for each program.
>This gives us a relative rating, which with
>95% probability is within a range of +-25-30 points.
>
>[ END ]
> 
>Ed Schroder
>
>
>>
>>  
>> Junior 5 has lost 11 points compared to the
>> latest list, and Atlanta has gained 9 points.
>>  
>> For the nearest future we intend to play more games
>> with Nimzo 99 and CM6000 and probably start
>> with Crafty. As soon as we decide about the next
>> hardware level, the best programs will be
>> retested on the faster processor.
>>  
>> Next list will appear in May.
>>  
>> Thoralf Karlsson
>
>---------------------------------
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.