Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: LCTII test vs SSDF results

Author: Bruce Moreland

Date: 15:55:12 09/27/99

Go up one level in this thread


On September 27, 1999 at 14:33:34, Georg v. Zimmermann wrote:

>I'm not sure whether i completely understand your post. Thx for the crafty-tip
>btw.
>
>I remember that one of the positional LCTII positions is solved by doctor3 in 5
>seconds, and never solved by fritz, one is solved by hiarcs, but not by the
>others.
>These positional moves are clearly (for a strong human) the best ones in the
>given position. Doesn't this test chess-knowledge, not speed ?

Here is my point.  Let's say that you play 10,000 round robin tournaments
between programs A, B, C, and D, and you get the following estimated ratings:

A  2400
B  2425
C  2450
D  2475

I can build a test that will allow you to predict the ratings of these programs.
 The test is:

    Elo = 2375 + N * 25

Where N is the ordinal value of the name of the program (A=1, B=2, C=3, D=4),
the "%" is the modulo operator, and integer math is used.

This formula predicts the ratings of each program with absolute accuracy.  If
you do this test at home, you will be able to predict the ratings yourself, and
it will always work.

Assume this test is done on a 200 mhz computer, and you guess that doubling
speed is worth 100 Elo points.  Then the formula can be modified as follows:

    Elo = 2275 + N * 25 + log2(mhz / 200) * 100

This will produce the same values as before, only it will also accurately
predict the rating if you increase processor speed.

You can't argue with this test, it is a perfect predictor for these four
programs on any computer, assuming that 100 points per doubling holds up.

Why won't anyone agree that this is a good test?  Because the test has no
relevance to the strength of the programs.  I couldn't get anyone to agree that
what you call your program matters enough that it affects the rating like this.

But look at these suite tests.  They are also created using a fixed set of
programs, on a particular processor, and calibrated to a scale that has been
predetermined (typically the SSDF list, or someone's feelings about what the
SSDF list should really show).  Do you think that this isn't done?  I can't
imagine someone just doing a test and picking a formula at random and magically
the right Elo numbers come out.  No, the test and the formula are both
calibrated against some predefined reality.

When you run one of these suites at home, on one of the programs that the test
was calibrated with, you are just replaying the number that the suite author
determined that the program should get.  It's not predicting anything, the whole
test is a recording of predetermined "facts", for many of the most popular
programs.

There is some question about whether the suite can be accurate for the programs
that it wasn't calibrated with, but since most of these probably use the most
popular programs to calibrate, you can't really compare anything.  If one of the
calibration programs comes back 2475, and some new program comes back with 2450,
who is to say that that the new program is really weaker, since the test has
been fiddled with until it produces 2475 for the first program?

bruce



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.