Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistical methods and their consequences

Author: Tony Hedlund

Date: 10:21:39 02/16/03

Go up one level in this thread


On February 15, 2003 at 07:12:10, Rolf Tueschen wrote:

>On February 15, 2003 at 05:24:43, Tony Hedlund wrote:
>
>>On February 14, 2003 at 16:27:31, Rolf Tueschen wrote:
>>
>>>On February 14, 2003 at 13:32:16, Tony Hedlund wrote:
>>>
>>>>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>>>>
>>>>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>>>>
>>>>>>
>>>>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>>>>for errors in interpretation.
>>>>>>
>>>>>>Bob D.
>>>>>
>>>>>
>>>>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>>>>difficult. In short : If you use a system of statistics you are not allowed to
>>>>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>>>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>>>>should, but must, if the differences in the actual results are way smaller than
>>>>>the error in the tests itself. Is that impossible to understand?
>>>>>
>>>>>Rolf Tueschen
>>>>
>>>>Then the right presentation is:
>>>>
>>>>1-10 Shredder 7         2801-2737
>>>>1-10 Deep Fritz 7       2789-2732
>>>>1-11 Fritz 7            2770-2711
>>>>1-2? Shredder 7 UCI     2761-2638
>>>>1-15 Chess Tiger 15     2753-2700
>>>>1-15 Shredder 6 Pad UCI 2750-2703
>>>>1-16 Shredder 6         2750-2689
>>>>1-19 Chess Tiger 14     2744-2684
>>>>1-19 Deep Fritz         2741-2680
>>>>1-19 Gambit Tiger 2     2739-2681
>>>>3-2? Junior 7           2715-2659
>>>>4-2? Hiarcs 8           2707-2657
>>>>
>>>>and so on.
>>>>
>>>>Tony
>>>
>>>Thanks for the fine joke, Tony. Perhaps you lay your figer into the wound!
>>>You want to have a number one, right? Then you make tests, just like you do,
>>>fair and correct. And then you come into the period where you must evaluate your
>>>results. You see that you have no clear umber one. Now two possibilities:
>>>
>>>1) You go on into decisive mode and do further tests, the "list" date can wait.
>>>
>>>2) You stay to your traditions and show up with your list. But then, please, do
>>>NOT present the list either in the classical way, nor in your joking Mr. Bean
>>>version, but simply make such packages:
>>>
>>>1.-3. A B C
>>>4.-5. D E
>>>6.    F
>>>7.-10. G H I
>>>etc.
>>>
>>>Tell me please, where the problem is with this method?
>>
>>Why just three strongest engines? With the margin of errors Gambit Tiger 2 could
>>be as strong as the other top engines. I find Mr. Bean's version more logic then
>>yours. Could you please explain your method further.
>
>
>SSDF has good statistics experts. Consult these experts and you will understand
>why Gambit Tiger 2 could NOT be number one. My first three was a pool where all
>could be number one. Only Shredder 7 UCI could be included, but my example was
>more a demonstration of such a list. It's not MY method. It's simply what
>careful researchers would do if they had your results. Perhaps you don't know
>it, Tony, but the presentation of the results must have a base in the results.

What do you propose SSDF do exactly? Give me a clear example of how you would
present the data. Don't give me this A, B and C. You have the result, wich
programs are A, B and C?

>In other words it might well be that one day you will have a clear number one.

The bottom line is that when we reach a margin of error close to zero, then we
can claim a number one? When will that happen? After 10 000 games by each
entrance?

>Or do you believe that your method guarantees the eternal status quo?
>
>
>
>>
>>>Is it because you have
>>>kind of strong wish to present a umber one by all means?
>>
>>Do you also think that FIDE shouldn't have a number one on there list? Is
>>Kasparov really the best player?
>
>Please do not seek for outside help, when you run out of arguments in favor of
>your own presentation.

FIDE, ICCF and SSDF all have a ratinglist. And we all use professor Arpad Elo's
metod of measure strenght in chess. And yes I argue for our way of presentation.
ICCF's number one Ulf Andersson have played 25 games! Figure the margin of error
there. They probably don't have any careful researchers.

>>
>>>Please let's simply
>>>discuss this little topic. If you tell me, listen, Rolf, I am not allowed to
>>>tell you, but you are right, that a umber one prog is very important for us.
>>
>>It seem to be more important to others.
>
>Yes, that was my deeper assumption. Could you give more details?

Details?
People here at CCC seem to be looking forward for our next list, to see wich is
number one. And then they congratulate the programmer. And of course the
commercials use it in there advertisement. As they always has. When we started
our list, it was as a complement to our reviews for new programmes.
Personally I'm not interested in wich program is number one. I'm more interested
in how the different engines are playing.

>Rolf Tueschen
>>
>>>Then, Tony, I am out of the debate, because I had great respect for your amateur
>>>approach. Comps are not cheap either. etc. To make it clear. I would not oppose
>>>sponsering. But if you said, but Rolf, look, we have a real number one! That is
>>>the exact result of our statistics. - Then however, I will continue to ask
>>>polite questions.
>>

The exact result of our statistics is the way Mr. Bean interpret the list.
You choosed not to comment on this, why?

Tony

>>
>>>Rolf Tueschen



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.