Author: Bertil Eklund
Date: 15:03:47 05/31/02
Answer to Rolf Tueschen First of all you promised to not answer my contribution. Anyway I'm sorry for your hard work with all this. Here is my (slightly forced) answer. Since I had promissed a few people to write a critical summary about SSDF ranking I started with a German version. From this article in Rolfs Mosaik (it's the number 8 there) I'll quote here the following questions. The problem is, that the critic is rather short in effect, but for most of the aspects I have no exact information that is why I wrote the nine questions for the beginning of a communication. My verdict however is already that the list has no validity. The whole presentation has a long tradition but no rational meaning. However SSDF could well make several changes and give the list a better foundation. [This is the final part of the article number 8] My translation: # Stats could only help to establish roughly correct numbers on a valid basis, but without validity the Elo numbers resemble the fata morgana that appears to those who are thirsty in the desert. [Footnote: In my first part I explained that the typical Elo numbers with 2500, 2600 or 2700 are adjusted to human players, a big pool of human players, not just 15 or 20 players! So SSDF simply has no validity at all.] # What is wrong in the SSDF stats besides the lacking validity. # To answer this we clarify what is characteristic for a chess program. # Hardware Engine Books Learning tool # What is necessary for a test experiment? Briefly - the control of these four factors/ parameters. # But at first we define, what we want to measure respectevely what should be the result. # We want to know, how successful the conmbination of Hardware, Engine, Books and Learning tool is playing. Successful play is called strength. # Here follows a list of simple questions. # 1) SSDF equips each time the new programs with the fastest hardware. Do we find out this way if the new engine is stronger than the older? No! Quite simply because the old engines could be as strong or stronger on new hardware. Usually the "best" engines are played on both new and old hardware. # 2) What's a match for between a (new) program and an old program, which is weaker in all 4 factors from above? How we could find out, which factor in the new program is responsible for the difference in strength? We couldn't know! If you and other reactionary people had been in charge we still should have used extremely limited books and programs with new learning. We should also wait a year or so until enough "new" programs are out to compete on the new hardware. Do you also think Kasparov shouldn't play against an opponent 100 elo weaker than himself. Do you have an idea of how the ELO-system works? Did you know that you can calculate the ratings both when you play against an opponent 30 elo above your rating or 150 elo below your rating? Obviously not. # 3) If as a result one program is 8 "Elo points" stronger, how could we know, that this is not caused by the different opponents? We couldn't know. Now we can't but it is much more exact, in general than a humans rating that maybee plays 40 games a year,and in the same town against the same opponent several times. # 4) How could we know, if the result with a difference of 8 points won't exactly turn around the rank of each two pairs of programs after some further 20 games each? We couldn't know that. Now we can't. So what?! Try to compare with the human ELO-list. The only thing we know is that, the human list is much more uncertain. # 5) SSDF is not suppressing games of a match, however is moving a match with only 5 games into the calculation of the Elo numbers and is continuing the rest of the match for the next publication. How could we know, that this effect does not influence the result of the actual edition? We couldn't know! Of course it influence the results in some way or another. Did you know that it is deadlines for the human list too. # 6) SSDF often matches newest progs vs ancient progs. Why? Because the variability of the choice of the opponent is important for the calculation of Elo numbers? Hence Kasparov is playing against a master player of about Elo 2350? Of course not! Such nonsense is not part of human chess [as necessity of Elo numbers!]! Or is it that the lacking validity of the computer should be replaced by the play against weakest and helpless opponents? We don't know. All new programs play against a pool of one or two dozens of programs, could be more than Kasparov! All programs plays against its predecessor (if any). Are you sure that it is better to play against an opponent 150 elo weaker than you then an equal opponent. Do you understand the ELO-system? # 7) Why SSDF is presenting a difference of ranks of 8 points as in May 2002 or earlier even of 1 point, if the margin of error is +/- 30 points and more? Is it possible to discover a difference between each programs at all? No! SSDF is presenting differences, which possibly do not exist in real because they can't be defined account of the uncertainty or unreliability of the measurement itself. So, could we believe the SSDF ranking list? No. [Not in its presented form.] So? If the difference between program A and B (in the above example) are less than 60 elo the result shouldn't be presented. # 8) SSDF is publishing only results, is implying in short commentaries what next should be tested, but details about the test design remain unknown. What are the conditions of the tests? We don't know. You know that we answer all such questions personally or here or in another forums. # 9) How many testers SSDF actually has? 10 or 20? No. I have confidential information that perhaps a handful of testers are doing the main job. Where are all the amateur testers in Sweden? We don't know. What's the problem if it is 5, 10 or 15 testers. Is it better if it is 20 or maybee 24. This list of questions could be continued if necessary. So, what is the meaning of the SSDF ranking list? Perhaps mere PR, because the winning program or the trio of winners could increase it's sales figures. Perhaps the programmers themselves are interested in the list. We don't know. The only meaning is the one that you can't understand the pure love and interest in computer chess. Can you maybee remember the time when the only buying advices was the advertisements from in example Fidelity or extremely blind persons like a few in this forum. Or a lot of renowned persons here that believe that the best program wins the "computer-chess" WM (the same persons that also claims that they understands statistics). [Actually this ranking is unable to answer our questions about strength.] [You could read my whole article (number 8) in German at http://members.aol.com/mclanecxantia/myhomepage/rolfsmosaik.html] Hopefully I should try it but for personal reasons I am very busy for the moment. Bertil
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.