Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistical methods and their consequences

Author: Rolf Tueschen

Date: 06:34:41 02/15/03

Go up one level in this thread


On February 15, 2003 at 07:08:52, Albert Silver wrote:

>On February 15, 2003 at 04:52:44, David Dory wrote:
>
>>On February 14, 2003 at 13:32:16, Tony Hedlund wrote:
>>
>>>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>>>
>>>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>>>
>>>>>
>>>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>>>for errors in interpretation.
>>>>>
>>>>>Bob D.
>>>>
>>>>
>>>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>>>difficult. In short : If you use a system of statistics you are not allowed to
>>>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>>>should, but must, if the differences in the actual results are way smaller than
>>>>the error in the tests itself. Is that impossible to understand?
>>>>
>>>>Rolf Tueschen
>>>
>>>Then the right presentation is:
>>>
>>>1-10 Shredder 7         2801-2737
>>>1-10 Deep Fritz 7       2789-2732
>>>1-11 Fritz 7            2770-2711
>>>1-2? Shredder 7 UCI     2761-2638
>>>1-15 Chess Tiger 15     2753-2700
>>>1-15 Shredder 6 Pad UCI 2750-2703
>>>1-16 Shredder 6         2750-2689
>>>1-19 Chess Tiger 14     2744-2684
>>>1-19 Deep Fritz         2741-2680
>>>1-19 Gambit Tiger 2     2739-2681
>>>3-2? Junior 7           2715-2659
>>>4-2? Hiarcs 8           2707-2657
>>>
>>>and so on.
>>>
>>>Tony
>>
>>Oh Good Grief!
>>Yes, I have to say I actually agree with Rolf. The SSDF should NOT try to select
>>a number one UNLESS they have played enough games to be sure they have the right
>>program selected, taking into account the margin of error.
>
>I don't agree. The SSDF present their findings and that's it. The findings show
>how well a program did against other programs. After hundreds of games they show
>the *current* rating (it changes as more results are added) of the program as
>well as the number of games, individual results, and the margin of error. The
>results are presented according to the highest to lowest rating. There is no
>'selection' of the top program. What would you have them do? Present it in
>alphabetical order? Furthermore, the best program against humans may easily not
>be the best program against other programs.
>
>                                      Albert


The question "Present them in alphabetical order?" shows the complete lack of
understanding statistics and also the unwillingness to digest the messages
already made. I said what should/must be done. This is not up to them but a
logic of statistics itself. Now that must hurt people who think that all is a
question of best selling management. I would never attack you personally, you
might be a fine person, but you have no idea of such necessities of science. And
NO! You can't simply react and say "But they are no scientists!" although this
is correct. The point is that you are not allowed to adopt a certain routine
from science ad then quickly forgetting about the clearly defined context of
such routines. I try to make that point for years by now. Without much success.
And "FIDE lists" is surely no way-out! In FIDE you have at least a relative
stability [over the years] of what you want to measure. But that is exactly the
point why the adoption of Elo doesn't work for the always new seasonal flash in
the pan. <cough>

Rolf Tueschen



>
>>
>>I'm sure this is a nod in the direction of marketing hype, but for commercial
>>chess programs, the marketing force HAS to be very strong, otherwise the program
>>probably would not exist for long.
>>
>>You have a point Rolf, but it will be buried by market hype, and that's life.
>>The whole SSDF rating work perhaps can best be thought of as a longer tournament
>>- ie., the strongest program may not win the top spot (because enough games are
>>not played to differentiate all the programs), but that's tournament life.
>>
>>Welcome to SSDF life. All in all, you have to really appreciate their work, if
>>not every little aspect of how they present their findings.
>>
>>
>>Dave



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.