Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistical methods and their consequences

Author: Albert Silver

Date: 04:08:52 02/15/03

Go up one level in this thread


On February 15, 2003 at 04:52:44, David Dory wrote:

>On February 14, 2003 at 13:32:16, Tony Hedlund wrote:
>
>>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>>
>>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>>
>>>>
>>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>>for errors in interpretation.
>>>>
>>>>Bob D.
>>>
>>>
>>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>>difficult. In short : If you use a system of statistics you are not allowed to
>>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>>should, but must, if the differences in the actual results are way smaller than
>>>the error in the tests itself. Is that impossible to understand?
>>>
>>>Rolf Tueschen
>>
>>Then the right presentation is:
>>
>>1-10 Shredder 7         2801-2737
>>1-10 Deep Fritz 7       2789-2732
>>1-11 Fritz 7            2770-2711
>>1-2? Shredder 7 UCI     2761-2638
>>1-15 Chess Tiger 15     2753-2700
>>1-15 Shredder 6 Pad UCI 2750-2703
>>1-16 Shredder 6         2750-2689
>>1-19 Chess Tiger 14     2744-2684
>>1-19 Deep Fritz         2741-2680
>>1-19 Gambit Tiger 2     2739-2681
>>3-2? Junior 7           2715-2659
>>4-2? Hiarcs 8           2707-2657
>>
>>and so on.
>>
>>Tony
>
>Oh Good Grief!
>Yes, I have to say I actually agree with Rolf. The SSDF should NOT try to select
>a number one UNLESS they have played enough games to be sure they have the right
>program selected, taking into account the margin of error.

I don't agree. The SSDF present their findings and that's it. The findings show
how well a program did against other programs. After hundreds of games they show
the *current* rating (it changes as more results are added) of the program as
well as the number of games, individual results, and the margin of error. The
results are presented according to the highest to lowest rating. There is no
'selection' of the top program. What would you have them do? Present it in
alphabetical order? Furthermore, the best program against humans may easily not
be the best program against other programs.

                                      Albert

>
>I'm sure this is a nod in the direction of marketing hype, but for commercial
>chess programs, the marketing force HAS to be very strong, otherwise the program
>probably would not exist for long.
>
>You have a point Rolf, but it will be buried by market hype, and that's life.
>The whole SSDF rating work perhaps can best be thought of as a longer tournament
>- ie., the strongest program may not win the top spot (because enough games are
>not played to differentiate all the programs), but that's tournament life.
>
>Welcome to SSDF life. All in all, you have to really appreciate their work, if
>not every little aspect of how they present their findings.
>
>
>Dave



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.