Author: Bob Durrett
Date: 05:43:12 02/14/03
Go up one level in this thread
On February 14, 2003 at 08:35:29, Albert Silver wrote: >On February 14, 2003 at 07:10:40, Rolf Tueschen wrote: > >>Just to explain some basics for new readers, I show why the whole List is >>worthless. The rankings are by chance the way they are presented. >> >>Since only a few here have basic knowledge in statistics I explain the most >>apparet things. >> >>We are told that for instance the two first programs are seperated by 8 points. >>No matter Stefan get all the credits here for his first place. But is true that >>Shredder is stronger than Fritz? >> >>Here I must tell you that we simply don't know it. The SSDF pretend to know it, >>but it is NOT true. How can I say such things? Easy! Look at the deviations. >>These numbers with + or -. We see that most programs have an expected Elo number >>varying plus/mius of about 30 points! Note, that the Elo minus 5 is as probable >>as the fially given Elo for the ranking! >> >>If you then take a look at the Elo of the opponents in the far right you can see >>that even for the top programs the SSDF was unable to create equal conditions. >>Also this influence by different opponents makes the 8 numbers difference at the >>top meaningless. >> >>In sum we can say that the SSDF failed to show - exactly what they pretend to >>show - the differences between the actual top programs. The SSDF presents a new >>leader, but that is against its own results! So that the conclusion is allowed >>that SSDF makes deliberately their own new number 1! > >Your comment that being number 1 in the list is not an absolute is completely >correct. The SSDF doesn't claim it is a statistical absolute either, which is >why they present the data: rating performance, number of games, AND the error >margin. > > > THE SSDF RATING LIST 2003-02-13 90961 games played by 251 computers > Rating + - Games Won Oppo > ------ --- --- ----- --- ---- > 1 Shredder 7.0 256MB Athlon 1200 MHz 2768 33 -31 547 72% 2606 > 2 Deep Fritz 7.0 256MB Athlon 1200 MHz 2760 29 -28 654 70% 2612 > 3 Fritz 7.0 256MB Athlon 1200 MHz 2740 30 -29 574 64% 2635 > 4 Chess Tiger 15.0 256MB Athlon 1200 MHz 2726 27 -26 704 64% 2623 > > >If they present the error margin, doesn't this *clearly* mean that the result >may be off by that much? However, so far the current performance is 2768 SSDF >points. How many games does a human play to get their rating? I won't event >mention the ridiculously low requirement by FIDE to play only 9 games to get a >first rating. Suppose I had no rating and played 100 games against a 2000 Elo >player and I scored 75/100. My performance is 2200 exactly. Is it absolute? No, >there is a good margin of error, yet no one will question the rating and start >telling me I'm not rated 2200, I'm just rated anywhere between 2140 and 2260. I >see no difference. They had Shredder 7 play 547 games against other programs, >and presented the results PLUS the error margin. It *may* still be a fraction >weaker than Deep Fritz 7, but already it is clear that it performas better than >Chess Tiger 15 against other computers. But even if another 200 games changed >the top ratings to Shredder 7 = 2762 and DF7 = 2763 would anyone be so foolish >as to claim one program is actually any stronger?? I certainly would never think >of an opponent rated 10 points more as stronger. The fact that two such >different playing styles achieve almost identical performances shows how rich >and flexible chess is. > > Albert Excellent points. The "bottom line" is that SSDF presented their findings properly, but the problem is in interpretation. SSDF cannot be held responsible for errors in interpretation. Bob D. > >> >>(Note please that this is not a political speech, however it is what statistics >>demands. The SSDF got this critic so often in the past but they still did't >>change their experimental setting.) >> >>Rolf Tueschen
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.