Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistical methods and their consequences

Author: Rolf Tueschen

Date: 03:29:23 02/17/03

Go up one level in this thread


On February 16, 2003 at 13:21:39, Tony Hedlund wrote:

>On February 15, 2003 at 07:12:10, Rolf Tueschen wrote:
>
>>On February 15, 2003 at 05:24:43, Tony Hedlund wrote:
>>
>>>On February 14, 2003 at 16:27:31, Rolf Tueschen wrote:
>>>
>>>>On February 14, 2003 at 13:32:16, Tony Hedlund wrote:
>>>>
>>>>>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>>>>>
>>>>>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>>>>>
>>>>>>>
>>>>>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>>>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>>>>>for errors in interpretation.
>>>>>>>
>>>>>>>Bob D.
>>>>>>
>>>>>>
>>>>>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>>>>>difficult. In short : If you use a system of statistics you are not allowed to
>>>>>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>>>>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>>>>>should, but must, if the differences in the actual results are way smaller than
>>>>>>the error in the tests itself. Is that impossible to understand?
>>>>>>
>>>>>>Rolf Tueschen
>>>>>
>>>>>Then the right presentation is:
>>>>>
>>>>>1-10 Shredder 7         2801-2737
>>>>>1-10 Deep Fritz 7       2789-2732
>>>>>1-11 Fritz 7            2770-2711
>>>>>1-2? Shredder 7 UCI     2761-2638
>>>>>1-15 Chess Tiger 15     2753-2700
>>>>>1-15 Shredder 6 Pad UCI 2750-2703
>>>>>1-16 Shredder 6         2750-2689
>>>>>1-19 Chess Tiger 14     2744-2684
>>>>>1-19 Deep Fritz         2741-2680
>>>>>1-19 Gambit Tiger 2     2739-2681
>>>>>3-2? Junior 7           2715-2659
>>>>>4-2? Hiarcs 8           2707-2657
>>>>>
>>>>>and so on.
>>>>>
>>>>>Tony
>>>>
>>>>Thanks for the fine joke, Tony. Perhaps you lay your figer into the wound!
>>>>You want to have a number one, right? Then you make tests, just like you do,
>>>>fair and correct. And then you come into the period where you must evaluate your
>>>>results. You see that you have no clear umber one. Now two possibilities:
>>>>
>>>>1) You go on into decisive mode and do further tests, the "list" date can wait.
>>>>
>>>>2) You stay to your traditions and show up with your list. But then, please, do
>>>>NOT present the list either in the classical way, nor in your joking Mr. Bean
>>>>version, but simply make such packages:
>>>>
>>>>1.-3. A B C
>>>>4.-5. D E
>>>>6.    F
>>>>7.-10. G H I
>>>>etc.
>>>>
>>>>Tell me please, where the problem is with this method?
>>>
>>>Why just three strongest engines? With the margin of errors Gambit Tiger 2 could
>>>be as strong as the other top engines. I find Mr. Bean's version more logic then
>>>yours. Could you please explain your method further.
>>
>>
>>SSDF has good statistics experts. Consult these experts and you will understand
>>why Gambit Tiger 2 could NOT be number one. My first three was a pool where all
>>could be number one. Only Shredder 7 UCI could be included, but my example was
>>more a demonstration of such a list. It's not MY method. It's simply what
>>careful researchers would do if they had your results. Perhaps you don't know
>>it, Tony, but the presentation of the results must have a base in the results.
>
>What do you propose SSDF do exactly? Give me a clear example of how you would
>present the data. Don't give me this A, B and C. You have the result, wich
>programs are A, B and C?
>
>>In other words it might well be that one day you will have a clear number one.
>
>The bottom line is that when we reach a margin of error close to zero, then we
>can claim a number one? When will that happen? After 10 000 games by each
>entrance?
>
>>Or do you believe that your method guarantees the eternal status quo?
>>
>>
>>
>>>
>>>>Is it because you have
>>>>kind of strong wish to present a umber one by all means?
>>>
>>>Do you also think that FIDE shouldn't have a number one on there list? Is
>>>Kasparov really the best player?
>>
>>Please do not seek for outside help, when you run out of arguments in favor of
>>your own presentation.
>
>FIDE, ICCF and SSDF all have a ratinglist. And we all use professor Arpad Elo's
>metod of measure strenght in chess. And yes I argue for our way of presentation.
>ICCF's number one Ulf Andersson have played 25 games! Figure the margin of error
>there. They probably don't have any careful researchers.
>
>>>
>>>>Please let's simply
>>>>discuss this little topic. If you tell me, listen, Rolf, I am not allowed to
>>>>tell you, but you are right, that a umber one prog is very important for us.
>>>
>>>It seem to be more important to others.
>>
>>Yes, that was my deeper assumption. Could you give more details?
>
>Details?
>People here at CCC seem to be looking forward for our next list, to see wich is
>number one. And then they congratulate the programmer. And of course the
>commercials use it in there advertisement. As they always has. When we started
>our list, it was as a complement to our reviews for new programmes.
>Personally I'm not interested in wich program is number one. I'm more interested
>in how the different engines are playing.

I can well imagine your personal sentiments and I have great respect for your
efforts with SSDF as a whole but you can't stop history's progress. When you
played move by move with the ancient chessboards your dedication and hard work
was really sensational and people got results for their virgin background. Today
- with autoplayed games - you have more time to do sound statistics. However, if
simply the top programs do not differ that much then you can't call out a number
one. Or you play millions of games. But who guarantees you that then you will
have a clear first? No - you should accept the actual reality. And that is
equality among the top entries.

You are misleaden if you think that the thankfullness of the CC users was linked
with your presentation of a number one. It was because of your general efforts
to the best of CC. And the business world at that time was very coloured. But
today we have a single important company. Do you want to do your job for them
and their marketing interests  or for the users around the world? You must
accept that if statistically you have no clear first then you can't present a
number one program. What does that bother you??? You are independent! But
independent does not mean naive.Why don't you consider the consequences of such
strange events: Fritz8 is out for months and you don't test it. I read that you
wait until ChessBase will send you a copy. But that then would no longer speak
for your independent tests. Because factor time of testbeginning always was a
factor. All such dangers and difficulties you could avoid with sound statistics
and certain basic guidelines. You must become independent of such marketing
decisions by ChessBase.

Don't ask me for the details. I am not a member and I was defamated long enough
by your collegues in the staff.

Rolf Tueschen







>
>>Rolf Tueschen
>>>
>>>>Then, Tony, I am out of the debate, because I had great respect for your amateur
>>>>approach. Comps are not cheap either. etc. To make it clear. I would not oppose
>>>>sponsering. But if you said, but Rolf, look, we have a real number one! That is
>>>>the exact result of our statistics. - Then however, I will continue to ask
>>>>polite questions.
>>>
>
>The exact result of our statistics is the way Mr. Bean interpret the list.
>You choosed not to comment on this, why?
>
>Tony
>
>>>
>>>>Rolf Tueschen



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.