Computer Chess Club Archives


Search

Terms

Messages

Subject: Comments of latest SSDF list-Nine basic questions

Author: Bertil Eklund

Date: 15:03:47 05/31/02


Answer to Rolf Tueschen

First of all you promised to not answer my contribution.
Anyway I'm sorry for your hard work with all this.

Here is my (slightly forced) answer.

Since I had promissed a few people to write a critical summary about SSDF
ranking I started with a German version. From this article in Rolfs Mosaik (it's
the number 8 there) I'll quote here the following questions. The problem is,
that the critic is rather short in effect, but for most of the aspects I have no
exact information that is why I wrote the nine questions for the beginning of a
communication. My verdict however is already that the list has no validity. The
whole presentation has a long tradition but no rational meaning. However SSDF
could well make several changes and give the list a better foundation.

[This is the final part of the article number 8]

My translation:

# Stats could only help to establish roughly correct numbers on a valid basis,
but without validity the Elo numbers resemble the
fata morgana that appears to those who are thirsty in the desert. [Footnote: In
my first part I explained that the typical Elo numbers with 2500, 2600 or 2700
are adjusted to human players, a big pool of human players, not just 15 or 20
players! So SSDF simply has no validity at all.]

# What is wrong in the SSDF stats besides the lacking validity.

# To answer this we clarify what is characteristic for a chess program.

# Hardware
  Engine
  Books
  Learning tool

# What is necessary for a test experiment?
Briefly - the control of these four factors/ parameters.

# But at first we define, what we want to measure respectevely what should be
the result.

# We want to know, how successful the conmbination of Hardware, Engine, Books
and Learning tool is playing. Successful play is called strength.

# Here follows a list of simple questions.

# 1) SSDF equips each time the new programs with the fastest hardware. Do we
find out this way if the new engine is stronger than the older? No! Quite simply
because the old engines could be as strong or stronger on new hardware.

Usually the "best" engines are played on both new and old hardware.

# 2) What's a match for between a (new) program and an old program, which is
weaker in all 4 factors from above? How we could find out, which factor in the
new program is responsible for the difference in strength? We couldn't know!

If you and other reactionary people had been in charge we still should have used
 extremely limited books and programs with new learning. We should also wait a
year or so until enough "new" programs are out to compete on the new hardware.
Do you also think Kasparov shouldn't play against an opponent 100 elo weaker
than himself. Do you have an idea of how the ELO-system works? Did you know that
you can calculate the ratings both when you play against an opponent 30 elo
above your rating or 150 elo below your rating? Obviously not.

# 3) If as a result one program is 8 "Elo points" stronger, how could we know,
that this is not caused by the different opponents? We couldn't know.

Now we can't but it is much more exact, in general than a humans rating that
maybee plays 40 games a year,and in the same town against the same opponent
several times.

# 4) How could we know, if the result with a difference of 8 points won't
exactly turn around the rank of each two pairs of programs after some further 20
games each? We couldn't know that.

Now we can't. So what?! Try to compare with the human ELO-list. The only thing
we know is that, the human list is much more uncertain.

# 5) SSDF is not suppressing games of a match, however is moving a match with
only 5 games into the calculation of the Elo numbers and is continuing the rest
of the match for the next publication. How could we know, that this effect does
not influence the result of the actual edition? We couldn't know!

Of course it influence the results in some way or another. Did you know that it
is deadlines for the human list too.

# 6) SSDF often matches newest progs vs ancient progs. Why? Because the
variability of the choice of the opponent is important for the calculation of
Elo numbers? Hence Kasparov is playing against a master player of about Elo
2350? Of course not! Such nonsense is not part of human chess [as necessity of
Elo numbers!]! Or is it that the lacking validity of the computer should be
replaced by the play against weakest and helpless opponents? We don't know.

All new programs play against a pool of one or two dozens of programs, could be
more than Kasparov! All programs plays against its predecessor (if any). Are you
sure that it is better to play against an opponent 150 elo weaker than you then
an equal opponent. Do you understand the ELO-system?

# 7) Why SSDF is presenting a difference of ranks of 8 points as in May 2002 or
earlier even of 1 point, if the margin of error is +/- 30 points and more? Is it
possible to discover a difference between each programs at all? No! SSDF is
presenting differences, which possibly do not exist in real because they can't
be defined account of the uncertainty or unreliability of the measurement
itself. So, could we believe the SSDF ranking list? No. [Not in its presented
form.]

So? If the difference between program A and B (in the above example) are less
than 60 elo the result shouldn't be presented.

# 8) SSDF is publishing only results, is implying in short commentaries what
next should be tested, but details about the test design remain unknown. What
are the conditions of the tests? We don't know.

You know that we answer all such questions personally or here or in another
forums.

# 9) How many testers SSDF actually has? 10 or 20? No. I have confidential
information that perhaps a handful of testers are doing the main job. Where are
all the amateur testers in Sweden? We don't know.

What's the problem if it is 5, 10 or 15 testers. Is it better if it is 20 or
maybee 24.

This list of questions could be continued if necessary.

So, what is the meaning of the SSDF ranking list? Perhaps mere PR, because the
winning program or the trio of winners could increase it's sales figures.
Perhaps the programmers themselves are interested in the list. We don't know.

The only meaning is the one that you can't understand the pure love and interest
in computer chess. Can you maybee remember the time when the only buying advices
was the advertisements from in example Fidelity or extremely blind persons like
a few in this forum. Or a lot of renowned persons here that believe that the
best program wins the "computer-chess" WM (the same persons that also claims
that they understands statistics).

[Actually this ranking is unable to answer our questions about strength.]

[You could read my whole article (number 8) in German at
http://members.aol.com/mclanecxantia/myhomepage/rolfsmosaik.html]

Hopefully I should try it but for personal reasons I am very busy for the
moment.

Bertil







This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.