Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Not meaningless - just not absolute (Therefore a fake! see below)

Author: Rolf Tueschen

Date: 06:12:29 02/14/03

Go up one level in this thread


On February 14, 2003 at 08:35:29, Albert Silver wrote:

>On February 14, 2003 at 07:10:40, Rolf Tueschen wrote:
>
>>Just to explain some basics for new readers, I show why the whole List is
>>worthless. The rankings are by chance the way they are presented.
>>
>>Since only a few here have basic knowledge in statistics I explain the most
>>apparet things.
>>
>>We are told that for instance the two first programs are seperated by 8 points.
>>No matter Stefan get all the credits here for his first place. But is true that
>>Shredder is stronger than Fritz?
>>
>>Here I must tell you that we simply don't know it. The SSDF pretend to know it,
>>but it is NOT true. How can I say such things? Easy! Look at the deviations.
>>These numbers with + or -. We see that most programs have an expected Elo number
>>varying plus/mius of about 30 points! Note, that the Elo minus 5 is as probable
>>as the fially given Elo for the ranking!
>>
>>If you then take a look at the Elo of the opponents in the far right you can see
>>that even for the top programs the SSDF was unable to create equal conditions.
>>Also this influence by different opponents makes the 8 numbers difference at the
>>top meaningless.
>>
>>In sum we can say that the SSDF failed to show - exactly what they pretend to
>>show - the differences between the actual top programs. The SSDF presents a new
>>leader, but that is against its own results! So that the conclusion is allowed
>>that SSDF makes deliberately their own new number 1!
>
>Your comment that being number 1 in the list is not an absolute is completely
>correct.

Thank you and I am also please to read a message without any insults and that is
good so. We can concentrate on the facts. But as I could see some people don't
like that we talk about the facts too much.




>The SSDF doesn't claim it is a statistical absolute either,

This is false. The SSDF speaks of a Number One. Of a new number one etc. Doyou
want the evidence? Also ChessBase printed the same wording in its commercials!
Still not believing me? It is as if you didn't want or can't understand what I
am saying. I don't say they are cheaters. I did never say these Swedes are not
worth called testers. I say that they make unneccessary mistakes. And I say that
the staff there is simply not listening.

You are right. If I say number one and give the deviations THEN in real I am
saying that we have no number one. Now that is what you should ask the Swedes
why they talk such nonsense.




> which is
>why they present the data: rating performance, number of games, AND the error
>margin.

Yes, Albert, I knw this, and it's why I am angry. Because it's not sound. If
they would NOT give theses details it would be more honest than giving them and
then still claiming a number one program. When there is no such program!




>
>
>     THE SSDF RATING LIST 2003-02-13   90961 games played by  251 computers
>                                           Rating   +     -  Games   Won  Oppo
>                                           ------  ---   --- -----   ---  ----
>   1 Shredder 7.0  256MB Athlon 1200 MHz     2768   33   -31   547   72%  2606
>   2 Deep Fritz 7.0  256MB Athlon 1200 MHz   2760   29   -28   654   70%  2612
>   3 Fritz 7.0 256MB Athlon 1200 MHz         2740   30   -29   574   64%  2635
>   4 Chess Tiger 15.0  256MB Athlon 1200 MHz 2726   27   -26   704   64%  2623
>
>
>If they present the error margin, doesn't this *clearly* mean that the result
>may be off by that much? However, so far the current performance is 2768 SSDF
>points.


Yes,Albert and yesterday evening, just 4 hours before 2768 they had it the other
way round and that is the point! I see that you can't admit the consequences of
a factual deliberate presentation. NB a presentation MUST be independant of all
such possibilities. From its design already. Ad the argument, I heard often
enough from SSDF, that unfortunately they had to make a break because of the
date of publication. But this is not ok! Ok, if they had a date, THEN they
should also tell the people that only therefore at the moment they had such and
such. And then they should say - honestly - 1.-3. or such. But to give the
appearance that now Shredder would be FIRST is simply FALSE.



>How many games does a human play to get their rating?

That is NOT the point. I will tell you what is also dishonest and false! Talking
about the number of games, didn't you discover that Fritz 7  who is for such a
long time on the scene they played the same number of games than with the two
new entries Deep Fritz7 and Shredder7. So tell me please. Do they act after a
pre-designed and fair plan or do they test on a fly to get the results perhaps
not they themselves but a certain company wants?





>I won't event
>mention the ridiculously low requirement by FIDE to play only 9 games to get a
>first rating. Suppose I had no rating and played 100 games against a 2000 Elo
>player and I scored 75/100.


I would not even try to compare this ridiculous SSDF Elo with the FIDE Elo.




>My performance is 2200 exactly. Is it absolute? No,
>there is a good margin of error, yet no one will question the rating and start
>telling me I'm not rated 2200, I'm just rated anywhere between 2140 and 2260. I
>see no difference.

Yes, but I never read about "Albert now number one!" either. Only then we had
that problem, we have with SSDF! I that so difficult?




>They had Shredder 7 play 547 games against other programs,
>and presented the results PLUS the error margin. It *may* still be a fraction
>weaker than Deep Fritz 7,


Thank you, that is my point.


> but already it is clear that it performas better than
>Chess Tiger 15 against other computers.


Not clear from the list, but probable.



> But even if another 200 games changed
>the top ratings to Shredder 7 = 2762 and DF7 = 2763 would anyone be so foolish
>as to claim one program is actually any stronger?? I certainly would never think
>of an opponent rated 10 points more as stronger. The fact that two such
>different playing styles achieve almost identical performances shows how rich
>and flexible chess is.


I have a general statement. You are completely correct. With one exception and
that is exactly, for strange reasons, the commercial business aspect! You are
too naive here. And I say intentionally. Because look in your message to Eduard
you asked him if he thought that ChessBase perhaps held back Fritz8 to either
not hurt Fritz 8 business or the Shredder business?

ROFL!

I would say "both"!

And this is not a forbidden conclusion, it's so obvious.

Thanks for the soud message and excuse me that I still could find the key of
commercial interest, Albert.

Rolf Tueschen

>
>                                         Albert
>
>>
>>(Note please that this is not a political speech, however it is what statistics
>>demands. The SSDF got this critic so often in the past but they still did't
>>change their experimental setting.)
>>
>>Rolf Tueschen



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.