Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: It is about time for the SSDF to Upgrade their Oldware :-)

Author: Dann Corbit

Date: 07:56:45 04/11/05

Go up one level in this thread


On April 10, 2005 at 12:37:10, Jorge Pichard wrote:

>When will the SSDF upgrade their Hardwares?
>
>If we look at history, the next hardware is usually more than twice as fast as
>its predecessor; Pentium 90 to Pentium 200MMX, 200MMX to Pentium II 450, Pentium
>II 450 to Athlon 1200Mhz. I guess the next processor will be at least 2.5 Ghz
>Athlon. It is about time for the SSDF to Upgrade their Hardware :-)

This stems from a fundamental misunderstanding of SSDF testing.

The older programs on 450 MHz have the most games played.

Therefore, they tell us more information about the strength of a program than
playing against the same program on faster hardware.

In fact, you would have to completely recalibrate the older programs on the
faster machines to even use the data.

Consider the following:

      THE SSDF RATING LIST 2005-02-25   %101022 games played by  270 computers
                                           Rating   +     -  Games   Won  Oppo
  63 SOS  128MB  K6-2 450 MHz                2518   15   -16  2085   37%  2610
  47 Junior 6.0  128MB K6-2 450 MHz          2593   15   -15  2144   49%  2598
  66 Fritz 5.32  64MB P200 MMX               2494   14   -14  2563   42%  2549

Notice that these engines have a rating of about 2500, and an error bar of about
30 Elo.  That means we know the strength of these engines to an incredibly good
degree.  We can consider these as micrometers.  Now as a contrast, consider the
following:

      THE SSDF RATING LIST 2005-02-25   %101022 games played by  270 computers
                                           Rating   +     -  Games   Won  Oppo
  39 Ruffian 2.0.0  256MB  Athlon 1200 MHz   2619   54   -55   165   48%  2635
  43 Gromit 3.11.9  256MB Athlon 1200 MHz    2607   44   -46   246   43%  2659
  60 Crafty 19.17  256MB Athlon 1200 MHz     2522   43   -47   264   31%  2664

For the following engines (a group a bit stronger than the above on average) we
do not have many games played (about 5-10% as much as the above engines).
Notice that the error bar is 100 Elo or so.  This is like using a rope with
knots tied in it, by comparison.  When we play against these engines, we do not
learn nearly so much as we would by playing against the previous list on much
slower hardware.

The whole point of calibration of the list is to find out how strong the other
engines are.  As a clarifying point -- imagine playing the first list up above
against Shredder 10.0 on 1.2 GHz.  We know before we start that Shredder is
probably going to pound the stuffings out of them.  So the result of an
individual contest is not interesting as far as who is going to win the matchup.
 However, the result as to exactly what ratio Shredder wins would be intensely
interesting, as it will give us a very good idea about how strong Shredder is.

It is also a good idea to have a spectrum of different strengths in the
contests.

In short the SSDF is doing everything right.  The criticism that they receive is
generally due to lack of understanding about what the experiments hope to
deliver and the means by which they deliver it.

IMO-YMMV



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.