Computer Chess Club Archives


Search

Terms

Messages

Subject: More old postings

Author: Vincent Diepeveen

Date: 11:24:14 03/05/04

Go up one level in this thread


here an email i had forwarded from uni to home:

Another thing here. Please show me 1 occasion where SSDF harmed Fritz. I can
show you for about every other program problems that occured with it in one or
another way, but never for Fritz.

-------------------------------

Received: from smtp3.xs4all.nl (smtp3.xs4all.nl [194.109.6.53])
        by maildrop.xs4all.nl (8.8.8/8.8.8) with ESMTP id LAA21875
        for <diep@xs4all.nl>; Mon, 29 Mar 1999 11:27:16 +0200 (CEST)
Received: from mail.students.cs.uu.nl (vmailer@solar.students.cs.uu.nl
[131.211.82.28])
        by smtp3.xs4all.nl (8.8.8/8.8.8) with ESMTP id LAA05460
        for <diep@xs4all.nl>; Mon, 29 Mar 1999 11:27:16 +0200 (CEST)
Received: by mail.students.cs.uu.nl (Postfix)
        id B4A3785E3; Mon, 29 Mar 1999 11:27:15 +0200 (MET DST)
Delivered-To: vdiepeve@students.cs.ruu.nl
Received: from smtp1.xs4all.nl (smtp1.xs4all.nl [194.109.6.51])
        by mail.students.cs.uu.nl (Postfix) with ESMTP id 2D6FC85E2
        for <vdiepeve@students.cs.ruu.nl>; Mon, 29 Mar 1999 11:27:15 +0200 (MET
DST)
Received: from schroder (dc2-modem2215.dial.xs4all.nl [194.109.136.167])
        by smtp1.xs4all.nl (8.8.8/8.8.8) with SMTP id LAA21809;
        Mon, 29 Mar 1999 11:23:31 +0200 (CEST)
Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl>
Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl>
Message-Id: <Version.32.19990329103057.00ea9d70@mail.xs4all.nl>
X-Sender: schroder@mail.xs4all.nl
X-Mailer: QUALCOMM Windows Eudora Pro Version 4.0
Date: Mon, 29 Mar 1999 11:24:12 +0200
To: "Thoralf Karlsson"
<and a huge CC list>
From: Ed Schroder <schroder@xs4all.nl>
Subject: Re: New rating list
In-Reply-To: <000f01be7945$4203c440$6b6bf482@default>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by maildrop.xs4all.nl id
LAA21875
X-UIDL: 922699637.maildrop.21877


>
>    1 Chessmaster 6000  64MB P200 MMX         2576   88   -71   100   78% 
> 2363


My warm congrats to Johan in the first place!


>
> The SSDF rating list is made with the ambition
> to provide reliable information about the playing
> strength of chess programs and chess computers.
>  
> The ideal way to reach this goal would be to play
> hundreds of tournament games with each program
> against humans in the same country or against
> international ELO-rated players. Unfortunately
> this is an impossible task.
>  
> We have chosen to play matches with 20-40 games
> between a new program and as many as possible of
> the earlier tested programs. The goal is to get
> about 600 - 800 tournament games for each program.
> This gives us a relative rating, which with
> 95% probability is within a range of +-25-30 points.
>  
> Hopefully this rating could roughly be achieved if
> the same amount of games were to be played against
> human players. But we cannot be sure about that. The
> level of the rating list is based on the results of
> 337 games against Swedish chess players, played in
> serious tournaments between 1987 and 1991. The chess
> computers then used were of course much weaker than
> todays top programs running on fast hardware.
>  
> Many things have to be considered when the ratings
> are interpreted. Is there a difference between the
> Swedish rating level and the level in other countries?
> It seems so. How does the Swedish level correlate
> to the level in the ELO-system?
>  
> Are some chess programs relatively better or worse
> against humans than other programs? It could be so.
> But in order to prove that, you would have to play
> hundreds of games against humans with several of
> the chess programs, and this has not been done.
>  
> Is an improvement with say 400 rating points on
> the SSDF rating list comparable with the same
> improvement against humans? Or could there be a
> "spreading" effect when games between computers
> are played?
>  
> The results in the Aegon-tournament 1992 - 1997,
> where up to 300 games against humans were
> played each year, showed ratings slightly above
> what could be expected from the rating list. This
> indicates that the level of the list is rather
> correct, but doesn't prove it. The playing level
> in Aegon was somewhat faster than tournament play,
> and the humans didn't have Swedish ratings.


            TOP 12 FOR THE MAIN-COMPUTER PROGRAMMES
                         1991-1997
  ----------------------------------------------------------------
   1. Rebel                  +21    = 8    - 7     25/36   = 69.4%
   2. Chess Genius           +15    = 9    - 6    19½/30   = 65.0%
   3. Chessica (Fritz)       + 9    = 5    - 4    11½/18   = 63.8%
   4. Hiarcs                 +16    = 6    - 8     19/30   = 63.3%
   5. The King (Chessmaster) +22    =14    -12     29/48   = 60.4%
   6. M-Chess Pro            +26    = 5    -17    28½/48   = 59.3%
   7. Chessmaster (The King) + 8    = 5    - 5    10½/18   = 58.3%
   8. Virtual Chess          + 9    = 3    - 6    10½/18   = 58.3%
   9. Fritz                  +14    =13    - 9    20½/36   = 56.9%
  10. Quest (Fritz)          +15    =11    -10    20½/36   = 56.9%
  11. Kallisto               +17    = 6    -13     20/36   = 55.5%
  12. Nimzo                  +12    = 6    -12     15/30   = 50.0%
  ----------------------------------------------------------------

This AEGON statistic more or less implies the opposite that you can't compare
comp-comp with human-comp.


>
> More strictly speaking, the SSDF rating list
> provides information about the relative ratings
> for chess programs, when they are tested in the
> way SSDF has chosen.


You provide comp-comp information only.


>
>  
> Could the result be different with another testing
> method? Yes, it could. We are presenting results
> on the tournament level, 40 moves in two hours.
> Games played on the blitz level or with one minute
> per move, would probably give a different order of
> the programs.
>  
> The same would most likely happen if we had used
> a faster or different processor. Some programs
> run relatively better on a AMD-processor with
> 64 kB level-1 cache than on P200 MMX.
>  
> Programs with good learners benefit from long
> series of games against the same opponent. If
> we somehow automatically could change the oppo-
> nent after each game, the result might differ.
>  
> Obviously, the SSDF rating list doesn't say any-
> thing about programs which haven't been tested
> by us. Our ambition is to include all of the
> strongest commercially available programs. Some-
> times we also have the possibility to play with
> amateur programs.
>  
> At the moment some of the stronger programs are
> missing because of juridical (perhaps illegal?)
> threats. Another reason for programs not being
> included is the fact that they cannot be tested
> automatically with auto232. The reason being
> that most testers are not willing to go back
> to the time consuming manual testing.


If you start testing programs in a fair way (that is admit and correct when
mistakes are made) some producers may reconsider their policy.

SSDF Rebel9 testing.............

#1. Over 100 games were played without the main-book loaded. As a result
Rebel9
played with a book of just 4.5 Kb instead of the main-book of 400 Kb.

#2. You did not test Rebel9 on its strongest settings. In more than 100 games
Rebel9 played without its learner. I stopped counting after 100 games.


>
> Thanks to four of SSDFs testers, we can now
> include one of these programs without auto232,
> which exactly reached the minimum number of
> games to be accepted. It is Johan de Konings
> Chessmaster 6000 P200 MMX, which after one
> hundred games takes a shared first place on
> the list with 2576!!
>  
> Compared to the earlier program version, CM5000,
> which we tested on P90, the new program seems
> to be about 110 points stronger! I can not deny
> that the result surprised me somewhat. Still,
> the rating is based on only 100 games with a
> margin of error of 70-80 points, so much could
> change if we succeed in playing more games.
>  
> Fritz 5.32 has gone up one point, which gives it
> a shared first place together with CM6000.
> Hiarcs 7.0 P200 MMX has lost 9 points and now
> takes the third place.
>  
> Chrilly Donningers program Nimzo 99 P200 MMX is new
> on the list. After 390 games it has a rating of
> 2565, which gives it a fourth place. Compared to
> Nimzo 98, the rating has gone up with 42 points!
>  
> Most of the games are played with a program
> version dated 98-12-14, the rest with a version
> from 98-11-01. The possibility to tell Nimzo 99
> if it plays against Rebel, Junior or Fritz has
> been used in one or two of the matches.


Also unfair.

Amazing you have included these results.

Next step (similar telling Nimzo99 its opponent) is to allow "opponent
specific
book" made out of won autoplay games and load them.

Please explain what this has to do with your above statement?

[ BEGIN ]

We have chosen to play matches with 20-40 games
between a new program and as many as possible of
the earlier tested programs. The goal is to get
about 600 - 800 tournament games for each program.
This gives us a relative rating, which with
95% probability is within a range of +-25-30 points.

[ END ]
 
Ed Schroder


>
>  
> Junior 5 has lost 11 points compared to the
> latest list, and Atlanta has gained 9 points.
>  
> For the nearest future we intend to play more games
> with Nimzo 99 and CM6000 and probably start
> with Crafty. As soon as we decide about the next
> hardware level, the best programs will be
> retested on the faster processor.
>  
> Next list will appear in May.
>  
> Thoralf Karlsson

---------------------------------



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.