Author: Dann Corbit
Date: 07:56:45 04/11/05
Go up one level in this thread
On April 10, 2005 at 12:37:10, Jorge Pichard wrote:
>When will the SSDF upgrade their Hardwares?
>
>If we look at history, the next hardware is usually more than twice as fast as
>its predecessor; Pentium 90 to Pentium 200MMX, 200MMX to Pentium II 450, Pentium
>II 450 to Athlon 1200Mhz. I guess the next processor will be at least 2.5 Ghz
>Athlon. It is about time for the SSDF to Upgrade their Hardware :-)
This stems from a fundamental misunderstanding of SSDF testing.
The older programs on 450 MHz have the most games played.
Therefore, they tell us more information about the strength of a program than
playing against the same program on faster hardware.
In fact, you would have to completely recalibrate the older programs on the
faster machines to even use the data.
Consider the following:
THE SSDF RATING LIST 2005-02-25 %101022 games played by 270 computers
Rating + - Games Won Oppo
63 SOS 128MB K6-2 450 MHz 2518 15 -16 2085 37% 2610
47 Junior 6.0 128MB K6-2 450 MHz 2593 15 -15 2144 49% 2598
66 Fritz 5.32 64MB P200 MMX 2494 14 -14 2563 42% 2549
Notice that these engines have a rating of about 2500, and an error bar of about
30 Elo. That means we know the strength of these engines to an incredibly good
degree. We can consider these as micrometers. Now as a contrast, consider the
following:
THE SSDF RATING LIST 2005-02-25 %101022 games played by 270 computers
Rating + - Games Won Oppo
39 Ruffian 2.0.0 256MB Athlon 1200 MHz 2619 54 -55 165 48% 2635
43 Gromit 3.11.9 256MB Athlon 1200 MHz 2607 44 -46 246 43% 2659
60 Crafty 19.17 256MB Athlon 1200 MHz 2522 43 -47 264 31% 2664
For the following engines (a group a bit stronger than the above on average) we
do not have many games played (about 5-10% as much as the above engines).
Notice that the error bar is 100 Elo or so. This is like using a rope with
knots tied in it, by comparison. When we play against these engines, we do not
learn nearly so much as we would by playing against the previous list on much
slower hardware.
The whole point of calibration of the list is to find out how strong the other
engines are. As a clarifying point -- imagine playing the first list up above
against Shredder 10.0 on 1.2 GHz. We know before we start that Shredder is
probably going to pound the stuffings out of them. So the result of an
individual contest is not interesting as far as who is going to win the matchup.
However, the result as to exactly what ratio Shredder wins would be intensely
interesting, as it will give us a very good idea about how strong Shredder is.
It is also a good idea to have a spectrum of different strengths in the
contests.
In short the SSDF is doing everything right. The criticism that they receive is
generally due to lack of understanding about what the experiments hope to
deliver and the means by which they deliver it.
IMO-YMMV
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.