Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Statistical methods and their consequences

Author: David Dory

Date: 01:52:44 02/15/03

Go up one level in this thread


On February 14, 2003 at 13:32:16, Tony Hedlund wrote:

>On February 14, 2003 at 09:27:26, Rolf Tueschen wrote:
>
>>On February 14, 2003 at 08:43:12, Bob Durrett wrote:
>>
>>>
>>>Excellent points.  The "bottom line" is that SSDF presented their findings
>>>properly, but the problem is in interpretation.  SSDF cannot be held responsible
>>>for errors in interpretation.
>>>
>>>Bob D.
>>
>>
>>Wrong conclusion. I tried to explain the points but apparently it's a bit too
>>difficult. In short : If you use a system of statistics you are not allowed to
>>make your own presentation. The presentation by SSDF is FALSE. That is the
>>point. False and unallowed. Instead of 1., 2., 3., they should say 1.-3., not
>>should, but must, if the differences in the actual results are way smaller than
>>the error in the tests itself. Is that impossible to understand?
>>
>>Rolf Tueschen
>
>Then the right presentation is:
>
>1-10 Shredder 7         2801-2737
>1-10 Deep Fritz 7       2789-2732
>1-11 Fritz 7            2770-2711
>1-2? Shredder 7 UCI     2761-2638
>1-15 Chess Tiger 15     2753-2700
>1-15 Shredder 6 Pad UCI 2750-2703
>1-16 Shredder 6         2750-2689
>1-19 Chess Tiger 14     2744-2684
>1-19 Deep Fritz         2741-2680
>1-19 Gambit Tiger 2     2739-2681
>3-2? Junior 7           2715-2659
>4-2? Hiarcs 8           2707-2657
>
>and so on.
>
>Tony

Oh Good Grief!
Yes, I have to say I actually agree with Rolf. The SSDF should NOT try to select
a number one UNLESS they have played enough games to be sure they have the right
program selected, taking into account the margin of error.

I'm sure this is a nod in the direction of marketing hype, but for commercial
chess programs, the marketing force HAS to be very strong, otherwise the program
probably would not exist for long.

You have a point Rolf, but it will be buried by market hype, and that's life.
The whole SSDF rating work perhaps can best be thought of as a longer tournament
- ie., the strongest program may not win the top spot (because enough games are
not played to differentiate all the programs), but that's tournament life.

Welcome to SSDF life. All in all, you have to really appreciate their work, if
not every little aspect of how they present their findings.


Dave



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.