Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistical methods and their consequences

Author: Tony Hedlund

Date: 09:07:18 02/20/03

Go up one level in this thread


On February 20, 2003 at 09:32:38, Rolf Tueschen wrote:

>On February 20, 2003 at 09:12:18, Tony Hedlund wrote:
>
>>On February 18, 2003 at 16:22:58, Rolf Tueschen wrote:
>>
>>>On February 18, 2003 at 12:53:52, Tony Hedlund wrote:
>>>
>>>>On February 17, 2003 at 06:29:23, Rolf Tueschen wrote:
>>>>
>>>>>On February 16, 2003 at 13:21:39, Tony Hedlund wrote:
>>>>>
>>>>>>On February 15, 2003 at 07:12:10, Rolf Tueschen wrote:
>>>>>>
>>>>>>>On February 15, 2003 at 05:24:43, Tony Hedlund wrote:
>>>>>>>
>>>>>>>>On February 14, 2003 at 16:27:31, Rolf Tueschen wrote:
>>>>>>>>
>>>>>>>>>On February 14, 2003 at 13:32:16, Tony Hedlund wrote:
>>>>>>>>>
>>>>>>>>>>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>>>>>>>>>>
>>>>>>>>>>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>>>>>>>>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>>>>>>>>>>for errors in interpretation.
>>>>>>>>>>>>
>>>>>>>>>>>>Bob D.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>>>>>>>>>>difficult. In short : If you use a system of statistics you are not allowed to
>>>>>>>>>>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>>>>>>>>>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>>>>>>>>>>should, but must, if the differences in the actual results are way smaller than
>>>>>>>>>>>the error in the tests itself. Is that impossible to understand?
>>>>>>>>>>>
>>>>>>>>>>>Rolf Tueschen
>>>>>>>>>>
>>>>>>>>>>Then the right presentation is:
>>>>>>>>>>
>>>>>>>>>>1-10 Shredder 7         2801-2737
>>>>>>>>>>1-10 Deep Fritz 7       2789-2732
>>>>>>>>>>1-11 Fritz 7            2770-2711
>>>>>>>>>>1-2? Shredder 7 UCI     2761-2638
>>>>>>>>>>1-15 Chess Tiger 15     2753-2700
>>>>>>>>>>1-15 Shredder 6 Pad UCI 2750-2703
>>>>>>>>>>1-16 Shredder 6         2750-2689
>>>>>>>>>>1-19 Chess Tiger 14     2744-2684
>>>>>>>>>>1-19 Deep Fritz         2741-2680
>>>>>>>>>>1-19 Gambit Tiger 2     2739-2681
>>>>>>>>>>3-2? Junior 7           2715-2659
>>>>>>>>>>4-2? Hiarcs 8           2707-2657
>>>>>>>>>>
>>>>>>>>>>and so on.
>>>>>>>>>>
>>>>>>>>>>Tony
>>>>>>>>>
>>>>>>>>>Thanks for the fine joke, Tony. Perhaps you lay your figer into the wound!
>>>>>>>>>You want to have a number one, right? Then you make tests, just like you do,
>>>>>>>>>fair and correct. And then you come into the period where you must evaluate your
>>>>>>>>>results. You see that you have no clear umber one. Now two possibilities:
>>>>>>>>>
>>>>>>>>>1) You go on into decisive mode and do further tests, the "list" date can wait.
>>>>>>>>>
>>>>>>>>>2) You stay to your traditions and show up with your list. But then, please, do
>>>>>>>>>NOT present the list either in the classical way, nor in your joking Mr. Bean
>>>>>>>>>version, but simply make such packages:
>>>>>>>>>
>>>>>>>>>1.-3. A B C
>>>>>>>>>4.-5. D E
>>>>>>>>>6.    F
>>>>>>>>>7.-10. G H I
>>>>>>>>>etc.
>>>>>>>>>
>>>>>>>>>Tell me please, where the problem is with this method?
>>>>>>>>
>>>>>>>>Why just three strongest engines? With the margin of errors Gambit Tiger 2 could
>>>>>>>>be as strong as the other top engines. I find Mr. Bean's version more logic then
>>>>>>>>yours. Could you please explain your method further.
>>>>>>>
>>>>>>>
>>>>>>>SSDF has good statistics experts. Consult these experts and you will understand
>>>>>>>why Gambit Tiger 2 could NOT be number one. My first three was a pool where all
>>>>>>>could be number one. Only Shredder 7 UCI could be included, but my example was
>>>>>>>more a demonstration of such a list. It's not MY method. It's simply what
>>>>>>>careful researchers would do if they had your results. Perhaps you don't know
>>>>>>>it, Tony, but the presentation of the results must have a base in the results.
>>>>>>
>>>>>>What do you propose SSDF do exactly? Give me a clear example of how you would
>>>>>>present the data. Don't give me this A, B and C. You have the result, wich
>>>>>>programs are A, B and C?
>>>>>>
>>>>>>>In other words it might well be that one day you will have a clear number one.
>>>>>>
>>>>>>The bottom line is that when we reach a margin of error close to zero, then we
>>>>>>can claim a number one? When will that happen? After 10 000 games by each
>>>>>>entrance?
>>>>>>
>>>>>>>Or do you believe that your method guarantees the eternal status quo?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>Is it because you have
>>>>>>>>>kind of strong wish to present a umber one by all means?
>>>>>>>>
>>>>>>>>Do you also think that FIDE shouldn't have a number one on there list? Is
>>>>>>>>Kasparov really the best player?
>>>>>>>
>>>>>>>Please do not seek for outside help, when you run out of arguments in favor of
>>>>>>>your own presentation.
>>>>>>
>>>>>>FIDE, ICCF and SSDF all have a ratinglist. And we all use professor Arpad Elo's
>>>>>>metod of measure strenght in chess. And yes I argue for our way of presentation.
>>>>>>ICCF's number one Ulf Andersson have played 25 games! Figure the margin of error
>>>>>>there. They probably don't have any careful researchers.
>>>>>>
>>>>>>>>
>>>>>>>>>Please let's simply
>>>>>>>>>discuss this little topic. If you tell me, listen, Rolf, I am not allowed to
>>>>>>>>>tell you, but you are right, that a umber one prog is very important for us.
>>>>>>>>
>>>>>>>>It seem to be more important to others.
>>>>>>>
>>>>>>>Yes, that was my deeper assumption. Could you give more details?
>>>>>>
>>>>>>Details?
>>>>>>People here at CCC seem to be looking forward for our next list, to see wich is
>>>>>>number one. And then they congratulate the programmer. And of course the
>>>>>>commercials use it in there advertisement. As they always has. When we started
>>>>>>our list, it was as a complement to our reviews for new programmes.
>>>>>>Personally I'm not interested in wich program is number one. I'm more interested
>>>>>>in how the different engines are playing.
>>>>>
>>>>>I can well imagine your personal sentiments and I have great respect for your
>>>>>efforts with SSDF as a whole but you can't stop history's progress. When you
>>>>>played move by move with the ancient chessboards your dedication and hard work
>>>>>was really sensational and people got results for their virgin background. Today
>>>>>- with autoplayed games - you have more time to do sound statistics. However, if
>>>>>simply the top programs do not differ that much then you can't call out a number
>>>>>one. Or you play millions of games. But who guarantees you that then you will
>>>>>have a clear first? No - you should accept the actual reality. And that is
>>>>>equality among the top entries.
>>>>
>>>>That's why we have the margins of error. So the intelligent users can make that
>>>>interpretation.
>>>>
>>>>>You are misleaden if you think that the thankfullness of the CC users was linked
>>>>>with your presentation of a number one. It was because of your general efforts
>>>>>to the best of CC. And the business world at that time was very coloured. But
>>>>>today we have a single important company. Do you want to do your job for them
>>>>>and their marketing interests  or for the users around the world? You must
>>>>>accept that if statistically you have no clear first then you can't present a
>>>>>number one program. What does that bother you??? You are independent! But
>>>>>independent does not mean naive.Why don't you consider the consequences of such
>>>>>strange events: Fritz8 is out for months and you don't test it. I read that you
>>>>>wait until ChessBase will send you a copy. But that then would no longer speak
>>>>>for your independent tests.
>>>>
>>>>We also wait for a new version of Yace and some copies of CM9000.
>>>>
>>>>>Because factor time of testbeginning always was a
>>>>>factor. All such dangers and difficulties you could avoid with sound statistics
>>>>
>>>>We already have sound statistics. It's your OPINION that we don't.
>>>>
>>>>>and certain basic guidelines. You must become independent of such marketing
>>>>>decisions by ChessBase.
>>>>
>>>>Yes we depend on getting free copies of prgrams since we dont have the economy
>>>>to buy copies to all our testers.
>>>
>>>Since we have a very open and friendly debate, please could you answer two
>>>points?
>>>
>>>1) Tell me what you think about the message by Mogens Larsen! Please.
>>
>>Could you be more specific?
>
>http://www.talkchess.com/forums/1/message.html?284841

Yes, I've read his message. But he wrote a lot. Is it something specific you
want me to answer on?

Tony

>Rolf Tueschen
>
>
>
>>
>>>2) Let's break a taboo, Tony. Tell me how many testers you have. I have serious
>>>information that it's not higher than 5. Is this correct?
>>
>>No.
>>
>>>Let's face reality.
>>>When I take your published games then I detect only three authors.
>>
>>You should detect four.
>>
>>>So what does
>>>it mean when you talk about "testers".
>>
>>8-10.
>>
>>Tony
>>
>>>Rolf Tueschen
>>>
>>>
>>>>
>>>>>Don't ask me for the details. I am not a member and I was defamated long enough
>>>>>by your collegues in the staff.
>>>>>
>>>>>Rolf Tueschen
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>Rolf Tueschen
>>>>>>>>
>>>>>>>>>Then, Tony, I am out of the debate, because I had great respect for your amateur
>>>>>>>>>approach. Comps are not cheap either. etc. To make it clear. I would not oppose
>>>>>>>>>sponsering. But if you said, but Rolf, look, we have a real number one! That is
>>>>>>>>>the exact result of our statistics. - Then however, I will continue to ask
>>>>>>>>>polite questions.
>>>>>>>>
>>>>>>
>>>>>>The exact result of our statistics is the way Mr. Bean interpret the list.
>>>>>>You choosed not to comment on this, why?
>>>>>>
>>>>>>Tony
>>>>>>
>>>>>>>>
>>>>>>>>>Rolf Tueschen



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.