Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistical methods and their consequences (Red=Green)

Author: Rolf Tueschen

Date: 13:11:43 02/18/03

Go up one level in this thread


On February 18, 2003 at 13:20:19, Tony Hedlund wrote:

>On February 17, 2003 at 17:56:02, Rolf Tueschen wrote:
>
>>On February 17, 2003 at 13:36:28, Tony Hedlund wrote:
>>
>>>On February 17, 2003 at 09:05:31, Rolf Tueschen wrote:
>>>
>>>>On February 17, 2003 at 06:53:14, Uri Blass wrote:
>>>>
>>>>>On February 17, 2003 at 06:29:23, Rolf Tueschen wrote:
>>>>>
>>>>>>On February 16, 2003 at 13:21:39, Tony Hedlund wrote:
>>>>>>
>>>>>>>On February 15, 2003 at 07:12:10, Rolf Tueschen wrote:
>>>>>>>
>>>>>>>>On February 15, 2003 at 05:24:43, Tony Hedlund wrote:
>>>>>>>>
>>>>>>>>>On February 14, 2003 at 16:27:31, Rolf Tueschen wrote:
>>>>>>>>>
>>>>>>>>>>On February 14, 2003 at 13:32:16, Tony Hedlund wrote:
>>>>>>>>>>
>>>>>>>>>>>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>>>>>>>>>>>
>>>>>>>>>>>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>>>>>>>>>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>>>>>>>>>>>for errors in interpretation.
>>>>>>>>>>>>>
>>>>>>>>>>>>>Bob D.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>>>>>>>>>>>difficult. In short : If you use a system of statistics you are not allowed to
>>>>>>>>>>>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>>>>>>>>>>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>>>>>>>>>>>should, but must, if the differences in the actual results are way smaller than
>>>>>>>>>>>>the error in the tests itself. Is that impossible to understand?
>>>>>>>>>>>>
>>>>>>>>>>>>Rolf Tueschen
>>>>>>>>>>>
>>>>>>>>>>>Then the right presentation is:
>>>>>>>>>>>
>>>>>>>>>>>1-10 Shredder 7         2801-2737
>>>>>>>>>>>1-10 Deep Fritz 7       2789-2732
>>>>>>>>>>>1-11 Fritz 7            2770-2711
>>>>>>>>>>>1-2? Shredder 7 UCI     2761-2638
>>>>>>>>>>>1-15 Chess Tiger 15     2753-2700
>>>>>>>>>>>1-15 Shredder 6 Pad UCI 2750-2703
>>>>>>>>>>>1-16 Shredder 6         2750-2689
>>>>>>>>>>>1-19 Chess Tiger 14     2744-2684
>>>>>>>>>>>1-19 Deep Fritz         2741-2680
>>>>>>>>>>>1-19 Gambit Tiger 2     2739-2681
>>>>>>>>>>>3-2? Junior 7           2715-2659
>>>>>>>>>>>4-2? Hiarcs 8           2707-2657
>>>>>>>>>>>
>>>>>>>>>>>and so on.
>>>>>>>>>>>
>>>>>>>>>>>Tony
>>>>>>>>>>
>>>>>>>>>>Thanks for the fine joke, Tony. Perhaps you lay your figer into the wound!
>>>>>>>>>>You want to have a number one, right? Then you make tests, just like you do,
>>>>>>>>>>fair and correct. And then you come into the period where you must evaluate your
>>>>>>>>>>results. You see that you have no clear umber one. Now two possibilities:
>>>>>>>>>>
>>>>>>>>>>1) You go on into decisive mode and do further tests, the "list" date can wait.
>>>>>>>>>>
>>>>>>>>>>2) You stay to your traditions and show up with your list. But then, please, do
>>>>>>>>>>NOT present the list either in the classical way, nor in your joking Mr. Bean
>>>>>>>>>>version, but simply make such packages:
>>>>>>>>>>
>>>>>>>>>>1.-3. A B C
>>>>>>>>>>4.-5. D E
>>>>>>>>>>6.    F
>>>>>>>>>>7.-10. G H I
>>>>>>>>>>etc.
>>>>>>>>>>
>>>>>>>>>>Tell me please, where the problem is with this method?
>>>>>>>>>
>>>>>>>>>Why just three strongest engines? With the margin of errors Gambit Tiger 2 could
>>>>>>>>>be as strong as the other top engines. I find Mr. Bean's version more logic then
>>>>>>>>>yours. Could you please explain your method further.
>>>>>>>>
>>>>>>>>
>>>>>>>>SSDF has good statistics experts. Consult these experts and you will understand
>>>>>>>>why Gambit Tiger 2 could NOT be number one. My first three was a pool where all
>>>>>>>>could be number one. Only Shredder 7 UCI could be included, but my example was
>>>>>>>>more a demonstration of such a list. It's not MY method. It's simply what
>>>>>>>>careful researchers would do if they had your results. Perhaps you don't know
>>>>>>>>it, Tony, but the presentation of the results must have a base in the results.
>>>>>>>
>>>>>>>What do you propose SSDF do exactly? Give me a clear example of how you would
>>>>>>>present the data. Don't give me this A, B and C. You have the result, wich
>>>>>>>programs are A, B and C?
>>>>>>>
>>>>>>>>In other words it might well be that one day you will have a clear number one.
>>>>>>>
>>>>>>>The bottom line is that when we reach a margin of error close to zero, then we
>>>>>>>can claim a number one? When will that happen? After 10 000 games by each
>>>>>>>entrance?
>>>>>>>
>>>>>>>>Or do you believe that your method guarantees the eternal status quo?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>Is it because you have
>>>>>>>>>>kind of strong wish to present a umber one by all means?
>>>>>>>>>
>>>>>>>>>Do you also think that FIDE shouldn't have a number one on there list? Is
>>>>>>>>>Kasparov really the best player?
>>>>>>>>
>>>>>>>>Please do not seek for outside help, when you run out of arguments in favor of
>>>>>>>>your own presentation.
>>>>>>>
>>>>>>>FIDE, ICCF and SSDF all have a ratinglist. And we all use professor Arpad Elo's
>>>>>>>metod of measure strenght in chess. And yes I argue for our way of presentation.
>>>>>>>ICCF's number one Ulf Andersson have played 25 games! Figure the margin of error
>>>>>>>there. They probably don't have any careful researchers.
>>>>>>>
>>>>>>>>>
>>>>>>>>>>Please let's simply
>>>>>>>>>>discuss this little topic. If you tell me, listen, Rolf, I am not allowed to
>>>>>>>>>>tell you, but you are right, that a umber one prog is very important for us.
>>>>>>>>>
>>>>>>>>>It seem to be more important to others.
>>>>>>>>
>>>>>>>>Yes, that was my deeper assumption. Could you give more details?
>>>>>>>
>>>>>>>Details?
>>>>>>>People here at CCC seem to be looking forward for our next list, to see wich is
>>>>>>>number one. And then they congratulate the programmer. And of course the
>>>>>>>commercials use it in there advertisement. As they always has. When we started
>>>>>>>our list, it was as a complement to our reviews for new programmes.
>>>>>>>Personally I'm not interested in wich program is number one. I'm more interested
>>>>>>>in how the different engines are playing.
>>>>>>
>>>>>>I can well imagine your personal sentiments and I have great respect for your
>>>>>>efforts with SSDF as a whole but you can't stop history's progress. When you
>>>>>>played move by move with the ancient chessboards your dedication and hard work
>>>>>>was really sensational and people got results for their virgin background. Today
>>>>>>- with autoplayed games - you have more time to do sound statistics. However, if
>>>>>>simply the top programs do not differ that much then you can't call out a number
>>>>>>one. Or you play millions of games. But who guarantees you that then you will
>>>>>>have a clear first? No - you should accept the actual reality. And that is
>>>>>>equality among the top entries.
>>>>>>
>>>>>>You are misleaden if you think that the thankfullness of the CC users was linked
>>>>>>with your presentation of a number one. It was because of your general efforts
>>>>>>to the best of CC.
>>>>> And the business world at that time was very coloured. But
>>>>>>today we have a single important company. Do you want to do your job for them
>>>>>>and their marketing interests  or for the users around the world? You must
>>>>>>accept that if statistically you have no clear first then you can't present a
>>>>>>number one program.
>>>>>
>>>>>Number one only means leading it does not mean best.
>>>>>I do not see what is your problem with it.
>>>>>
>>>>>
>>>>> What does that bother you??? You are independent! But
>>>>>>independent does not mean naive.Why don't you consider the consequences of such
>>>>>>strange events: Fritz8 is out for months and you don't test it. I read that you
>>>>>>wait until ChessBase will send you a copy. But that then would no longer speak
>>>>>>for your independent tests. Because factor time of testbeginning always was a
>>>>>>factor. All such dangers and difficulties you could avoid with sound statistics
>>>>>>and certain basic guidelines. You must become independent of such marketing
>>>>>>decisions by ChessBase.
>>>>>
>>>>>I do not see what is the problem with waiting for chessbase to send the program.
>>>>>It is not that they do everything that chessbase tell them and
>>>>>I believe that if chessbase ask them not to test programs of another company
>>>>>like Tiger they will not do it.
>>>>>
>>>>>I believe that they should test only if programmers ask them otherwise they may
>>>>>waste time on testing the wrong versions and they will have no computer time
>>>>>to test the right versions.
>>>>>
>>>>>They did not test a lot of programs and Fritz8 is not alone.
>>>>>They did not test Movei and hundreds of free programs and I see no reason that
>>>>>testing Fritz8 is more important when the programmer did not ask them to do it.
>>>>>
>>>>>Note that I did not ask them to test Movei and I do not complain(Maybe I will
>>>>>ask them in the future when Movei will be significantly better).
>>>>>
>>>>>Note also that testing Fritz8 is more important than testing Movei if both
>>>>>programmers ask them to do it but if chessbase do not ask them to do it then
>>>>>buying Fritz8 in order to test it may be a waste of time because they will
>>>>>have no time to test stronger Fritz.
>>>>>
>>>>>I think that the customers may also be intereted in the rating of Fritz that
>>>>>chessbase send them because I believe that the customers will get the same Fritz
>>>>>as an update and if the ssdf waste time now on testing Fritz8 they will have no
>>>>>computer time to test the upgrade that chessbase may release.
>>>>>
>>>>>Uri
>>>>
>>>>
>>>>You have interesting views on independance. Please come into CTF so that we can
>>>>talk about Israel. What you say is unacceptable from the point of independant
>>>>testings. You don't believe it, but then you have no knowledge about the
>>>>neccessities of statistics. It's not a moral or such, it's a must! Otherwise the
>>>>results are NOT independant and you can trash SSDF.
>>>
>>>What you are saying is, since our number one is a program from Chessbase then we
>>>can't be independent. If Ruffian was number one this thread wouldn't have
>>>started, would it?
>>
>>No, where did I say such a nonsense? Please learn English before you make such
>>conclusions. I think I know what you are doing here. Instead of answering
>>http://www.talkchess.com/forums/1/message.html?284772, what you _couldn't_, you
>>step in here [what is normally no problem, but here it _is_ a problem!] without
>>exact understanding for the language of a message and try to stir confusion. The
>>reason why you do that is clear. You know that you have no justification for
>>your presentation of a number "one" and you see ccritics, so there is a single
>>possibility and that is stirring confusion, so that the reader should hear you
>>saying: "well, you know this is Rolf, what could he have to say? We, the SSDF,
>>are in the business for decades!" But all such doctoring does NOT change the
>>fact that you have no base for the presenting of Shredder 7 as "number one".
>
>It seems to me that you are running out of arguments, and so the insults starts.

It's the other way round. I gave my strongest argument that you must create
confusion (it's only Rolf, it's only against ChessBase), because you have no
base (statistically) for the presentation of a number one. And sure - you still
have no arguments. Therefore you now invent a new confusion, namely that I would
'insult'. Could you tell me where I insult? Where exactly? Why should I insult
you, you have never done me wrong in the past, other to your collegues Bertil
and Peter F. No, I declare that I had no reason to insult you and would never do
that. For me this is here more a psychological topic. I ask myself why such a
decent person like yourself suddenly go into such a mode of larmoyance.

We all here, me included, respect you in SSDF for the huge work you've done over
the decades. When I had the possibility to ask my questions in 1996, I was so
happy, after so many years I had followed your list. But from the beginning I
observed incredibly weak reactions. I will never forget the expression for
critics, namely "member of the Czub Anti-SSDF gang [sic!!]". That is ridiculous
for me because I had my questions right from my education in university studies
and mathematics. Suddenly I was accused, defamed to be a member of a gang! That
was in 1996.

In the meantime I published so many faults in your methodology and always the
main reply was "we are amateurs, not scientists".

Let me give you the probably most serious argument against your test methods.
You always argue that FIDE has Elolists, and you want to imply that your list
would just be the same or at least similar. I object. For very basic reasons.
Elo for human players has data a) for thousands of players and b) for thousands
of games for each player. Many players will have a record over the period of 30
years and more. The databases of publicly known games is about 2,5 million
games. -

Now let's take a look what _you_ have. No insult meant, Tony, honestly.

You know like I do, that you have modern "players" [program versions] with a
life of 12 months on possible different hard-ware. You always claim that you
have a database of 60000 games. To exploit that pool you always declare that
therefore a modern program MUST also be paired with a rather antique program.
Then you claim that validity is assured through some 30 games of Swedish players
20 years ago...

You know what I know? I can tell you. With such conditions you have no base for
a reasonable list. You have 5 or 10 programs each season that are comparable.
YOu have no justification to start the tests always with a number of 1500 or
such because the new version has ZERO Elo. And now you construct with imbreeding
technology GM results. With Elo all this has nothing to do. You have no history
in your ranking. What you have is the artificial combining of representatives of
differet species from different historic pasts. But these "representatives" have
surprisingly no own history in the developments of hard-ware for instance. But
you don't remark the basic fault. I explained it may times. If you use different
hard ware you can't test the strengths of programs. Nobody in SSDF understood
this although it is a very trivial argument or truth.

NB the difference to human Elo numbers. Look: Smyslov once was a World Champion,
right? He is still playing today.But his performance is down to 2450 or
something. But this is because of his age. Let's now take a look into SSDF
former World leading programs. Excuse me, Tony, I have no data about the early
results, but back in time MEPHISTO III surely was a good program. Or MChess 1.
Just take a prog out of that time. Why does such a program no longer play
today??? Why don't you test MChess 1 on Pentium IV??? That is what you should do
among other things. But what you do in reality is this: You become not tired to
test the newest versions of the company's progs. You have no interest for the
where-abouts of your earlier favorits, it's as if you all threw them into the
bin. And that makes your list so artificial and false!

I know, that you could say that it makes no sense to let Mchess 1 play because
1) we had MChess7 and 2) there is no sense in letting MChess1 play on P4.

But if that is the case then you should admit that you could NOT compare your
"list" with the human Elolist.

Tony, I invite you to think about all this - if you have time. Let's discuss
this in a friendly atmosphere. Perhaps we can find a new base for SSDF.







>
>>>>You are giving your personal opinions and nobody is allowed to attack you so far
>>>>but what is if you simply had no idea what is going on here? You have no
>>>>understanding for the meaning of average terms embedded in daily speech. You say
>>>>but they only tell us who is leading! That doesn't mean that he's the best. But
>>>>Uri, that is NOT the point at all. The point is that they cannot conclude that
>>>>someone is leading with these 8 points and a margin of 30 on both sides.
>>>
>>>But we can!
>>
>>No, you can't! - Of course you can do what you want. Next time you could present
>>X as new number one with 1 point advantage and 60 points of margin.
>
>Exactly!

Inyour own interest you should reconsider that opinion.



>
>>>As you pointed out earlier, and I quot "SSDF has good statistics
>>>experts".
>>
>>
>>Did I say that? Yes, often I like irony.
>
>So now it was irony?

Of course. It was clear because everybody knows my critic of your false
methodology. It's here in the archives and also on my homepage. See:
http://hometown.aol.de/rolftueschen/rolftueschenmosaik.html




>
>>
>>
>>>
>>>>You
>>>>have no idea what that exactly means!
>>>
>>>Speak for yourself.
>>
>>Sure, that is what I always do! I am famous for it and therefore certain
>>interested groups don't like me. But what is your business here? Uri and I have
>>a communication for months now and you seem to feel envy?
>
>Running out of arguments? You said to me, and I quot "Please let's simply
>discuss this little topic." So I was under the impression that this thread was
>between us.

yes. But here I was addressing Uri as you can see here below. You stepped i here
but you didn't answer the other message I made.



>
>>>>So then you can well talk about "Let them
>>>>do what they do, they are not doing something wrong"! Uri, they are so wrong,
>>>>more than your own Prime Minister! Because they do something very special:
>>>>
>>>>They say that Shredder7 is the new number one, the new leader as you say. And
>>>>they give these margins! Together that means: Folks, we have no clear result for
>>>>place one! And I argue against the mistakes. But here in CCC experts behave as
>>>>if the margins would make the overall verdict ok, because the experts know what
>>>>margins mean. I translate: experts are saying that a lie is not a lie as long as
>>>>the experts have a possibility to see whats really going on.
>>>
>>>YOU say it's a lie. That's your opinion, not a fact.
>>
>>
>>Again, please try to learn English before you step in other people's debates. I
>>did NOT say what you believe here.
>
>More insults? Other people's debate? You said, and I quot "But here in CCC
>experts behave as if the margins would make the overall verdict ok, because the
>experts know what margins mean. I translate: experts are saying that a lie is
>not a lie as long as the experts have a possibility to see whats really going
>on."

Yes, and that is the truth.I read more than once that experts here said that
possible errors in SSDF were of no importance because the experts knew how what
was meant. Interesting because the list is published in chess journals where
thousands of users read it, users without expert status. So this is not a honest
debate. IMO.





>
>
>>>
>>>>But the lack of
>>>>respect for the dumb users is well allowed, because that is business.
>>>
>>>We have respect for the users, it's for them we are doing the list. But we have
>>>no respect for DUMB users.
>>
>>Oh well, that will be a candidate for the quote of the year!
>>
>>
>>
>>>
>>>>Against
>>>>that confusion I say, no no, SSDF is responsible because THEY annouced new
>>>>number 1!
>>>
>>>Yes Rolf, SSDF is responsible for having a number 1 in the list.
>>
>>Yes, and that is why I criticised the faults of SSDF. Namely presenting a number
>>one that is not number one.
>
>But it is number one, within the margin of errors.


No! Within the margins you have no way to know who is first of the three progs.



>
>>I think a good analogy is this: you write a message
>>here with "Tony" and you supply a photo that is showing a man with _green_ hair.
>>Then in the header line you say "Tony" ("see photo, the man with the red [sic!]
>>hair"). Then Rolf writes a critic and shows that green hair is not the same as
>>red hair. Then Tony writes a message "we in SSDF have a long experience and
>>never before users criticised us for the presentation of wrong-colored hair;
>>only dumb users like Rolf have a problem with the difference between red and
>>green hair; in Sweden the two colors are the _same_!!! We in SSDF also have many
>>good color experts."
>>
>>:)
>
>Thanks for the fine joke, Rolf.


Do you take jokes as personal insults? Please let's not go into that mode. I
have great respect for you. And that does not change if you support errors in
the SSDF list. I think we can discuss this and hope that it could be changed. As
long as you don't call me names or make open insults, I try too give friendly
opinions.

Rolf Tueschen




>
>Tony
>
>>Rolf Tueschen
>>
>>>
>>>Tony
>>>
>>>>Rolf Tueschen



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.