Author: Chris Carson
Date: 16:36:47 05/31/02
Go up one level in this thread
On May 31, 2002 at 18:03:47, Bertil Eklund wrote: >Answer to Rolf Tueschen > >First of all you promised to not answer my contribution. >Anyway I'm sorry for your hard work with all this. > >Here is my (slightly forced) answer. > >Since I had promissed a few people to write a critical summary about SSDF >ranking I started with a German version. From this article in Rolfs Mosaik (it's >the number 8 there) I'll quote here the following questions. The problem is, >that the critic is rather short in effect, but for most of the aspects I have no >exact information that is why I wrote the nine questions for the beginning of a >communication. My verdict however is already that the list has no validity. The >whole presentation has a long tradition but no rational meaning. However SSDF >could well make several changes and give the list a better foundation. > >[This is the final part of the article number 8] > >My translation: > ># Stats could only help to establish roughly correct numbers on a valid basis, >but without validity the Elo numbers resemble the >fata morgana that appears to those who are thirsty in the desert. [Footnote: In >my first part I explained that the typical Elo numbers with 2500, 2600 or 2700 >are adjusted to human players, a big pool of human players, not just 15 or 20 >players! So SSDF simply has no validity at all.] > ># What is wrong in the SSDF stats besides the lacking validity. > ># To answer this we clarify what is characteristic for a chess program. > ># Hardware > Engine > Books > Learning tool > ># What is necessary for a test experiment? >Briefly - the control of these four factors/ parameters. > ># But at first we define, what we want to measure respectevely what should be >the result. > ># We want to know, how successful the conmbination of Hardware, Engine, Books >and Learning tool is playing. Successful play is called strength. > ># Here follows a list of simple questions. > ># 1) SSDF equips each time the new programs with the fastest hardware. Do we >find out this way if the new engine is stronger than the older? No! Quite simply >because the old engines could be as strong or stronger on new hardware. > >Usually the "best" engines are played on both new and old hardware. > ># 2) What's a match for between a (new) program and an old program, which is >weaker in all 4 factors from above? How we could find out, which factor in the >new program is responsible for the difference in strength? We couldn't know! > >If you and other reactionary people had been in charge we still should have used > extremely limited books and programs with new learning. We should also wait a >year or so until enough "new" programs are out to compete on the new hardware. >Do you also think Kasparov shouldn't play against an opponent 100 elo weaker >than himself. Do you have an idea of how the ELO-system works? Did you know that >you can calculate the ratings both when you play against an opponent 30 elo >above your rating or 150 elo below your rating? Obviously not. > ># 3) If as a result one program is 8 "Elo points" stronger, how could we know, >that this is not caused by the different opponents? We couldn't know. > >Now we can't but it is much more exact, in general than a humans rating that >maybee plays 40 games a year,and in the same town against the same opponent >several times. > ># 4) How could we know, if the result with a difference of 8 points won't >exactly turn around the rank of each two pairs of programs after some further 20 >games each? We couldn't know that. > >Now we can't. So what?! Try to compare with the human ELO-list. The only thing >we know is that, the human list is much more uncertain. > ># 5) SSDF is not suppressing games of a match, however is moving a match with >only 5 games into the calculation of the Elo numbers and is continuing the rest >of the match for the next publication. How could we know, that this effect does >not influence the result of the actual edition? We couldn't know! > >Of course it influence the results in some way or another. Did you know that it >is deadlines for the human list too. > ># 6) SSDF often matches newest progs vs ancient progs. Why? Because the >variability of the choice of the opponent is important for the calculation of >Elo numbers? Hence Kasparov is playing against a master player of about Elo >2350? Of course not! Such nonsense is not part of human chess [as necessity of >Elo numbers!]! Or is it that the lacking validity of the computer should be >replaced by the play against weakest and helpless opponents? We don't know. > >All new programs play against a pool of one or two dozens of programs, could be >more than Kasparov! All programs plays against its predecessor (if any). Are you >sure that it is better to play against an opponent 150 elo weaker than you then >an equal opponent. Do you understand the ELO-system? > ># 7) Why SSDF is presenting a difference of ranks of 8 points as in May 2002 or >earlier even of 1 point, if the margin of error is +/- 30 points and more? Is it >possible to discover a difference between each programs at all? No! SSDF is >presenting differences, which possibly do not exist in real because they can't >be defined account of the uncertainty or unreliability of the measurement >itself. So, could we believe the SSDF ranking list? No. [Not in its presented >form.] > >So? If the difference between program A and B (in the above example) are less >than 60 elo the result shouldn't be presented. > ># 8) SSDF is publishing only results, is implying in short commentaries what >next should be tested, but details about the test design remain unknown. What >are the conditions of the tests? We don't know. > >You know that we answer all such questions personally or here or in another >forums. > ># 9) How many testers SSDF actually has? 10 or 20? No. I have confidential >information that perhaps a handful of testers are doing the main job. Where are >all the amateur testers in Sweden? We don't know. > >What's the problem if it is 5, 10 or 15 testers. Is it better if it is 20 or >maybee 24. > >This list of questions could be continued if necessary. > >So, what is the meaning of the SSDF ranking list? Perhaps mere PR, because the >winning program or the trio of winners could increase it's sales figures. >Perhaps the programmers themselves are interested in the list. We don't know. > >The only meaning is the one that you can't understand the pure love and interest >in computer chess. Can you maybee remember the time when the only buying advices >was the advertisements from in example Fidelity or extremely blind persons like >a few in this forum. Or a lot of renowned persons here that believe that the >best program wins the "computer-chess" WM (the same persons that also claims >that they understands statistics). > >[Actually this ranking is unable to answer our questions about strength.] > >[You could read my whole article (number 8) in German at >http://members.aol.com/mclanecxantia/myhomepage/rolfsmosaik.html] > >Hopefully I should try it but for personal reasons I am very busy for the >moment. > >Bertil
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.