Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Comments of latest SSDF-Thanks (nt)

Author: Chris Carson

Date: 16:36:47 05/31/02

Go up one level in this thread


On May 31, 2002 at 18:03:47, Bertil Eklund wrote:

>Answer to Rolf Tueschen
>
>First of all you promised to not answer my contribution.
>Anyway I'm sorry for your hard work with all this.
>
>Here is my (slightly forced) answer.
>
>Since I had promissed a few people to write a critical summary about SSDF
>ranking I started with a German version. From this article in Rolfs Mosaik (it's
>the number 8 there) I'll quote here the following questions. The problem is,
>that the critic is rather short in effect, but for most of the aspects I have no
>exact information that is why I wrote the nine questions for the beginning of a
>communication. My verdict however is already that the list has no validity. The
>whole presentation has a long tradition but no rational meaning. However SSDF
>could well make several changes and give the list a better foundation.
>
>[This is the final part of the article number 8]
>
>My translation:
>
># Stats could only help to establish roughly correct numbers on a valid basis,
>but without validity the Elo numbers resemble the
>fata morgana that appears to those who are thirsty in the desert. [Footnote: In
>my first part I explained that the typical Elo numbers with 2500, 2600 or 2700
>are adjusted to human players, a big pool of human players, not just 15 or 20
>players! So SSDF simply has no validity at all.]
>
># What is wrong in the SSDF stats besides the lacking validity.
>
># To answer this we clarify what is characteristic for a chess program.
>
># Hardware
>  Engine
>  Books
>  Learning tool
>
># What is necessary for a test experiment?
>Briefly - the control of these four factors/ parameters.
>
># But at first we define, what we want to measure respectevely what should be
>the result.
>
># We want to know, how successful the conmbination of Hardware, Engine, Books
>and Learning tool is playing. Successful play is called strength.
>
># Here follows a list of simple questions.
>
># 1) SSDF equips each time the new programs with the fastest hardware. Do we
>find out this way if the new engine is stronger than the older? No! Quite simply
>because the old engines could be as strong or stronger on new hardware.
>
>Usually the "best" engines are played on both new and old hardware.
>
># 2) What's a match for between a (new) program and an old program, which is
>weaker in all 4 factors from above? How we could find out, which factor in the
>new program is responsible for the difference in strength? We couldn't know!
>
>If you and other reactionary people had been in charge we still should have used
> extremely limited books and programs with new learning. We should also wait a
>year or so until enough "new" programs are out to compete on the new hardware.
>Do you also think Kasparov shouldn't play against an opponent 100 elo weaker
>than himself. Do you have an idea of how the ELO-system works? Did you know that
>you can calculate the ratings both when you play against an opponent 30 elo
>above your rating or 150 elo below your rating? Obviously not.
>
># 3) If as a result one program is 8 "Elo points" stronger, how could we know,
>that this is not caused by the different opponents? We couldn't know.
>
>Now we can't but it is much more exact, in general than a humans rating that
>maybee plays 40 games a year,and in the same town against the same opponent
>several times.
>
># 4) How could we know, if the result with a difference of 8 points won't
>exactly turn around the rank of each two pairs of programs after some further 20
>games each? We couldn't know that.
>
>Now we can't. So what?! Try to compare with the human ELO-list. The only thing
>we know is that, the human list is much more uncertain.
>
># 5) SSDF is not suppressing games of a match, however is moving a match with
>only 5 games into the calculation of the Elo numbers and is continuing the rest
>of the match for the next publication. How could we know, that this effect does
>not influence the result of the actual edition? We couldn't know!
>
>Of course it influence the results in some way or another. Did you know that it
>is deadlines for the human list too.
>
># 6) SSDF often matches newest progs vs ancient progs. Why? Because the
>variability of the choice of the opponent is important for the calculation of
>Elo numbers? Hence Kasparov is playing against a master player of about Elo
>2350? Of course not! Such nonsense is not part of human chess [as necessity of
>Elo numbers!]! Or is it that the lacking validity of the computer should be
>replaced by the play against weakest and helpless opponents? We don't know.
>
>All new programs play against a pool of one or two dozens of programs, could be
>more than Kasparov! All programs plays against its predecessor (if any). Are you
>sure that it is better to play against an opponent 150 elo weaker than you then
>an equal opponent. Do you understand the ELO-system?
>
># 7) Why SSDF is presenting a difference of ranks of 8 points as in May 2002 or
>earlier even of 1 point, if the margin of error is +/- 30 points and more? Is it
>possible to discover a difference between each programs at all? No! SSDF is
>presenting differences, which possibly do not exist in real because they can't
>be defined account of the uncertainty or unreliability of the measurement
>itself. So, could we believe the SSDF ranking list? No. [Not in its presented
>form.]
>
>So? If the difference between program A and B (in the above example) are less
>than 60 elo the result shouldn't be presented.
>
># 8) SSDF is publishing only results, is implying in short commentaries what
>next should be tested, but details about the test design remain unknown. What
>are the conditions of the tests? We don't know.
>
>You know that we answer all such questions personally or here or in another
>forums.
>
># 9) How many testers SSDF actually has? 10 or 20? No. I have confidential
>information that perhaps a handful of testers are doing the main job. Where are
>all the amateur testers in Sweden? We don't know.
>
>What's the problem if it is 5, 10 or 15 testers. Is it better if it is 20 or
>maybee 24.
>
>This list of questions could be continued if necessary.
>
>So, what is the meaning of the SSDF ranking list? Perhaps mere PR, because the
>winning program or the trio of winners could increase it's sales figures.
>Perhaps the programmers themselves are interested in the list. We don't know.
>
>The only meaning is the one that you can't understand the pure love and interest
>in computer chess. Can you maybee remember the time when the only buying advices
>was the advertisements from in example Fidelity or extremely blind persons like
>a few in this forum. Or a lot of renowned persons here that believe that the
>best program wins the "computer-chess" WM (the same persons that also claims
>that they understands statistics).
>
>[Actually this ranking is unable to answer our questions about strength.]
>
>[You could read my whole article (number 8) in German at
>http://members.aol.com/mclanecxantia/myhomepage/rolfsmosaik.html]
>
>Hopefully I should try it but for personal reasons I am very busy for the
>moment.
>
>Bertil



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.