Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: SSDF Rating list

Author: Dann Corbit

Date: 17:16:39 12/27/01

Go up one level in this thread


On December 27, 2001 at 19:56:59, Mark Young wrote:

>Dann don't go nuts, many of us understand the stats. We are also smart enough to
>know that the range given could be greater and smaller then what is shown. It
>depends on what degree of confidence you are looking for in the stats.
>
>It could also be Crafty 18.12 is the strongest program on the ssdf list or many
>other programs with lower ratings. Thats the thing with stats, you can never be
>100% sure with 100% confidence. You can only go with what is most likely.

Crafty 18.12 is well outside of the specified range for the error bars.  So (for
instance) under the conditions of the test, if the hypothesis were:
"Crafty 18.12 run on Athlon 1200 under Autoplayer is as strong as Chess Tiger
14.0 CB on Athlon 1200" then the hypothesis would be rejected.  In fact, why
don't we look at the numbers:
   1 Chess Tiger 14.0 CB 256MB Athlon 1200   2715   38   -36   378   66%  2600
[snip]
  14 Crafty 18.12/CB 256MB  Athlon 1200 MHz  2601   44   -43   261   53%  2577

2601 + 44 = 2645 (the upper range of Crafty's strength under the conditions of
the experiment)
2715 - 36 = 2679 (the lower range of CT 14's strength).

Hence, we can say with confidence that within the precision of the error bars,
crafty is not as strong as CT.  Let's suppose that it is only one standard
deviation.  Then that means we have a probability of 2/3 or better that CT is
stronger.

With the figures for CT verses DF, the difference in strength is statistically
totally insignificant.  We cannot say (based on that available data) that CT is
stronger than DF.  It might be (or vice-versa) but it has not been shown.  This
is the most difficult case -- two sets of measurements that are almost
identical.  It would take far more data than anyone is willing to generate to
settle this issue.

You are (apparently) still insisting that the SSDF results show Chess Tiger to
be stronger, which clearly shows that you do not understand what the table
means.  The table definitely does *not* show that CT is stronger that DF.  To
state otherwise is a clear demonstration that you do not know what the numbers
mean.

Of course CT *might* be stronger.  It's just that the numbers do not show that.
What they show is parity.  Your interpretation of their meaning is just plain
wrong.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.