Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Statistical methods and their consequences (Red=Green)

Author: Tony Hedlund
Date: 10:20:19 02/18/03
On February 17, 2003 at 17:56:02, Rolf Tueschen wrote:

>On February 17, 2003 at 13:36:28, Tony Hedlund wrote:
>
>>On February 17, 2003 at 09:05:31, Rolf Tueschen wrote:
>>
>>>On February 17, 2003 at 06:53:14, Uri Blass wrote:
>>>
>>>>On February 17, 2003 at 06:29:23, Rolf Tueschen wrote:
>>>>
>>>>>On February 16, 2003 at 13:21:39, Tony Hedlund wrote:
>>>>>
>>>>>>On February 15, 2003 at 07:12:10, Rolf Tueschen wrote:
>>>>>>
>>>>>>>On February 15, 2003 at 05:24:43, Tony Hedlund wrote:
>>>>>>>
>>>>>>>>On February 14, 2003 at 16:27:31, Rolf Tueschen wrote:
>>>>>>>>
>>>>>>>>>On February 14, 2003 at 13:32:16, Tony Hedlund wrote:
>>>>>>>>>
>>>>>>>>>>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>>>>>>>>>>
>>>>>>>>>>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>>>>>>>>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>>>>>>>>>>for errors in interpretation.
>>>>>>>>>>>>
>>>>>>>>>>>>Bob D.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>>>>>>>>>>difficult. In short : If you use a system of statistics you are not allowed to
>>>>>>>>>>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>>>>>>>>>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>>>>>>>>>>should, but must, if the differences in the actual results are way smaller than
>>>>>>>>>>>the error in the tests itself. Is that impossible to understand?
>>>>>>>>>>>
>>>>>>>>>>>Rolf Tueschen
>>>>>>>>>>
>>>>>>>>>>Then the right presentation is:
>>>>>>>>>>
>>>>>>>>>>1-10 Shredder 7         2801-2737
>>>>>>>>>>1-10 Deep Fritz 7       2789-2732
>>>>>>>>>>1-11 Fritz 7            2770-2711
>>>>>>>>>>1-2? Shredder 7 UCI     2761-2638
>>>>>>>>>>1-15 Chess Tiger 15     2753-2700
>>>>>>>>>>1-15 Shredder 6 Pad UCI 2750-2703
>>>>>>>>>>1-16 Shredder 6         2750-2689
>>>>>>>>>>1-19 Chess Tiger 14     2744-2684
>>>>>>>>>>1-19 Deep Fritz         2741-2680
>>>>>>>>>>1-19 Gambit Tiger 2     2739-2681
>>>>>>>>>>3-2? Junior 7           2715-2659
>>>>>>>>>>4-2? Hiarcs 8           2707-2657
>>>>>>>>>>
>>>>>>>>>>and so on.
>>>>>>>>>>
>>>>>>>>>>Tony
>>>>>>>>>
>>>>>>>>>Thanks for the fine joke, Tony. Perhaps you lay your figer into the wound!
>>>>>>>>>You want to have a number one, right? Then you make tests, just like you do,
>>>>>>>>>fair and correct. And then you come into the period where you must evaluate your
>>>>>>>>>results. You see that you have no clear umber one. Now two possibilities:
>>>>>>>>>
>>>>>>>>>1) You go on into decisive mode and do further tests, the "list" date can wait.
>>>>>>>>>
>>>>>>>>>2) You stay to your traditions and show up with your list. But then, please, do
>>>>>>>>>NOT present the list either in the classical way, nor in your joking Mr. Bean
>>>>>>>>>version, but simply make such packages:
>>>>>>>>>
>>>>>>>>>1.-3. A B C
>>>>>>>>>4.-5. D E
>>>>>>>>>6.    F
>>>>>>>>>7.-10. G H I
>>>>>>>>>etc.
>>>>>>>>>
>>>>>>>>>Tell me please, where the problem is with this method?
>>>>>>>>
>>>>>>>>Why just three strongest engines? With the margin of errors Gambit Tiger 2 could
>>>>>>>>be as strong as the other top engines. I find Mr. Bean's version more logic then
>>>>>>>>yours. Could you please explain your method further.
>>>>>>>
>>>>>>>
>>>>>>>SSDF has good statistics experts. Consult these experts and you will understand
>>>>>>>why Gambit Tiger 2 could NOT be number one. My first three was a pool where all
>>>>>>>could be number one. Only Shredder 7 UCI could be included, but my example was
>>>>>>>more a demonstration of such a list. It's not MY method. It's simply what
>>>>>>>careful researchers would do if they had your results. Perhaps you don't know
>>>>>>>it, Tony, but the presentation of the results must have a base in the results.
>>>>>>
>>>>>>What do you propose SSDF do exactly? Give me a clear example of how you would
>>>>>>present the data. Don't give me this A, B and C. You have the result, wich
>>>>>>programs are A, B and C?
>>>>>>
>>>>>>>In other words it might well be that one day you will have a clear number one.
>>>>>>
>>>>>>The bottom line is that when we reach a margin of error close to zero, then we
>>>>>>can claim a number one? When will that happen? After 10 000 games by each
>>>>>>entrance?
>>>>>>
>>>>>>>Or do you believe that your method guarantees the eternal status quo?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>Is it because you have
>>>>>>>>>kind of strong wish to present a umber one by all means?
>>>>>>>>
>>>>>>>>Do you also think that FIDE shouldn't have a number one on there list? Is
>>>>>>>>Kasparov really the best player?
>>>>>>>
>>>>>>>Please do not seek for outside help, when you run out of arguments in favor of
>>>>>>>your own presentation.
>>>>>>
>>>>>>FIDE, ICCF and SSDF all have a ratinglist. And we all use professor Arpad Elo's
>>>>>>metod of measure strenght in chess. And yes I argue for our way of presentation.
>>>>>>ICCF's number one Ulf Andersson have played 25 games! Figure the margin of error
>>>>>>there. They probably don't have any careful researchers.
>>>>>>
>>>>>>>>
>>>>>>>>>Please let's simply
>>>>>>>>>discuss this little topic. If you tell me, listen, Rolf, I am not allowed to
>>>>>>>>>tell you, but you are right, that a umber one prog is very important for us.
>>>>>>>>
>>>>>>>>It seem to be more important to others.
>>>>>>>
>>>>>>>Yes, that was my deeper assumption. Could you give more details?
>>>>>>
>>>>>>Details?
>>>>>>People here at CCC seem to be looking forward for our next list, to see wich is
>>>>>>number one. And then they congratulate the programmer. And of course the
>>>>>>commercials use it in there advertisement. As they always has. When we started
>>>>>>our list, it was as a complement to our reviews for new programmes.
>>>>>>Personally I'm not interested in wich program is number one. I'm more interested
>>>>>>in how the different engines are playing.
>>>>>
>>>>>I can well imagine your personal sentiments and I have great respect for your
>>>>>efforts with SSDF as a whole but you can't stop history's progress. When you
>>>>>played move by move with the ancient chessboards your dedication and hard work
>>>>>was really sensational and people got results for their virgin background. Today
>>>>>- with autoplayed games - you have more time to do sound statistics. However, if
>>>>>simply the top programs do not differ that much then you can't call out a number
>>>>>one. Or you play millions of games. But who guarantees you that then you will
>>>>>have a clear first? No - you should accept the actual reality. And that is
>>>>>equality among the top entries.
>>>>>
>>>>>You are misleaden if you think that the thankfullness of the CC users was linked
>>>>>with your presentation of a number one. It was because of your general efforts
>>>>>to the best of CC.
>>>> And the business world at that time was very coloured. But
>>>>>today we have a single important company. Do you want to do your job for them
>>>>>and their marketing interests  or for the users around the world? You must
>>>>>accept that if statistically you have no clear first then you can't present a
>>>>>number one program.
>>>>
>>>>Number one only means leading it does not mean best.
>>>>I do not see what is your problem with it.
>>>>
>>>>
>>>> What does that bother you??? You are independent! But
>>>>>independent does not mean naive.Why don't you consider the consequences of such
>>>>>strange events: Fritz8 is out for months and you don't test it. I read that you
>>>>>wait until ChessBase will send you a copy. But that then would no longer speak
>>>>>for your independent tests. Because factor time of testbeginning always was a
>>>>>factor. All such dangers and difficulties you could avoid with sound statistics
>>>>>and certain basic guidelines. You must become independent of such marketing
>>>>>decisions by ChessBase.
>>>>
>>>>I do not see what is the problem with waiting for chessbase to send the program.
>>>>It is not that they do everything that chessbase tell them and
>>>>I believe that if chessbase ask them not to test programs of another company
>>>>like Tiger they will not do it.
>>>>
>>>>I believe that they should test only if programmers ask them otherwise they may
>>>>waste time on testing the wrong versions and they will have no computer time
>>>>to test the right versions.
>>>>
>>>>They did not test a lot of programs and Fritz8 is not alone.
>>>>They did not test Movei and hundreds of free programs and I see no reason that
>>>>testing Fritz8 is more important when the programmer did not ask them to do it.
>>>>
>>>>Note that I did not ask them to test Movei and I do not complain(Maybe I will
>>>>ask them in the future when Movei will be significantly better).
>>>>
>>>>Note also that testing Fritz8 is more important than testing Movei if both
>>>>programmers ask them to do it but if chessbase do not ask them to do it then
>>>>buying Fritz8 in order to test it may be a waste of time because they will
>>>>have no time to test stronger Fritz.
>>>>
>>>>I think that the customers may also be intereted in the rating of Fritz that
>>>>chessbase send them because I believe that the customers will get the same Fritz
>>>>as an update and if the ssdf waste time now on testing Fritz8 they will have no
>>>>computer time to test the upgrade that chessbase may release.
>>>>
>>>>Uri
>>>
>>>
>>>You have interesting views on independance. Please come into CTF so that we can
>>>talk about Israel. What you say is unacceptable from the point of independant
>>>testings. You don't believe it, but then you have no knowledge about the
>>>neccessities of statistics. It's not a moral or such, it's a must! Otherwise the
>>>results are NOT independant and you can trash SSDF.
>>
>>What you are saying is, since our number one is a program from Chessbase then we
>>can't be independent. If Ruffian was number one this thread wouldn't have
>>started, would it?
>
>No, where did I say such a nonsense? Please learn English before you make such
>conclusions. I think I know what you are doing here. Instead of answering
>http://www.talkchess.com/forums/1/message.html?284772, what you _couldn't_, you
>step in here [what is normally no problem, but here it _is_ a problem!] without
>exact understanding for the language of a message and try to stir confusion. The
>reason why you do that is clear. You know that you have no justification for
>your presentation of a number "one" and you see ccritics, so there is a single
>possibility and that is stirring confusion, so that the reader should hear you
>saying: "well, you know this is Rolf, what could he have to say? We, the SSDF,
>are in the business for decades!" But all such doctoring does NOT change the
>fact that you have no base for the presenting of Shredder 7 as "number one".

It seems to me that you are running out of arguments, and so the insults starts.

>>>You are giving your personal opinions and nobody is allowed to attack you so far
>>>but what is if you simply had no idea what is going on here? You have no
>>>understanding for the meaning of average terms embedded in daily speech. You say
>>>but they only tell us who is leading! That doesn't mean that he's the best. But
>>>Uri, that is NOT the point at all. The point is that they cannot conclude that
>>>someone is leading with these 8 points and a margin of 30 on both sides.
>>
>>But we can!
>
>No, you can't! - Of course you can do what you want. Next time you could present
>X as new number one with 1 point advantage and 60 points of margin.

Exactly!

>>As you pointed out earlier, and I quot "SSDF has good statistics
>>experts".
>
>
>Did I say that? Yes, often I like irony.

So now it was irony?

>
>
>>
>>>You
>>>have no idea what that exactly means!
>>
>>Speak for yourself.
>
>Sure, that is what I always do! I am famous for it and therefore certain
>interested groups don't like me. But what is your business here? Uri and I have
>a communication for months now and you seem to feel envy?

Running out of arguments? You said to me, and I quot "Please let's simply
discuss this little topic." So I was under the impression that this thread was
between us.

>>>So then you can well talk about "Let them
>>>do what they do, they are not doing something wrong"! Uri, they are so wrong,
>>>more than your own Prime Minister! Because they do something very special:
>>>
>>>They say that Shredder7 is the new number one, the new leader as you say. And
>>>they give these margins! Together that means: Folks, we have no clear result for
>>>place one! And I argue against the mistakes. But here in CCC experts behave as
>>>if the margins would make the overall verdict ok, because the experts know what
>>>margins mean. I translate: experts are saying that a lie is not a lie as long as
>>>the experts have a possibility to see whats really going on.
>>
>>YOU say it's a lie. That's your opinion, not a fact.
>
>
>Again, please try to learn English before you step in other people's debates. I
>did NOT say what you believe here.

More insults? Other people's debate? You said, and I quot "But here in CCC
experts behave as if the margins would make the overall verdict ok, because the
experts know what margins mean. I translate: experts are saying that a lie is
not a lie as long as the experts have a possibility to see whats really going
on."


>>
>>>But the lack of
>>>respect for the dumb users is well allowed, because that is business.
>>
>>We have respect for the users, it's for them we are doing the list. But we have
>>no respect for DUMB users.
>
>Oh well, that will be a candidate for the quote of the year!
>
>
>
>>
>>>Against
>>>that confusion I say, no no, SSDF is responsible because THEY annouced new
>>>number 1!
>>
>>Yes Rolf, SSDF is responsible for having a number 1 in the list.
>
>Yes, and that is why I criticised the faults of SSDF. Namely presenting a number
>one that is not number one.

But it is number one, within the margin of errors.

>I think a good analogy is this: you write a message
>here with "Tony" and you supply a photo that is showing a man with _green_ hair.
>Then in the header line you say "Tony" ("see photo, the man with the red [sic!]
>hair"). Then Rolf writes a critic and shows that green hair is not the same as
>red hair. Then Tony writes a message "we in SSDF have a long experience and
>never before users criticised us for the presentation of wrong-colored hair;
>only dumb users like Rolf have a problem with the difference between red and
>green hair; in Sweden the two colors are the _same_!!! We in SSDF also have many
>good color experts."
>
>:)

Thanks for the fine joke, Rolf.

Tony

>Rolf Tueschen
>
>>
>>Tony
>>
>>>Rolf Tueschen
Re: Statistical methods and their consequences (Red=Green) Rolf Tueschen 13:11:43 02/18/03
- Re: Statistical methods and their consequences (Red=Green) Tony Hedlund 07:32:04 02/20/03
  - Re: Statistical methods and their consequences (Red=Green) Rolf Tueschen 07:41:39 02/20/03
    - Re: Statistical methods and their consequences (Red=Green) Tony Hedlund 09:04:42 02/20/03
      - Re: Final Statement for now Rolf Tueschen 15:40:34 02/20/03
        
        Re: Final Statement for now Bertil Eklund 22:36:22 02/20/03
        
        Re: Final Statement for now--ah! resolution at last! (NT) Stephen A. Boak 20:45:44 02/20/03
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.