Subject: Re: Proving something is better

Author: Peter Fendrich

Date: 16:28:01 12/19/02

On December 19, 2002 at 02:35:47, Bruce Moreland wrote:

>On December 19, 2002 at 00:58:30, Omid David Tabibi wrote:
>>Based on the presented data:
>>Isn't it clear that vrfd R=3 is superior to std R=2 ?
>No, but it is likely.
>The Neishtadt suite is an odd choice since it contains a great many checkmate
>combinations.  I don't accept that this is a primary component of chess program
>strength.  I accept that VR=3 did better than R=2 on this test set, since the
>number of solutions found was greater in less time.
>There is a table that shows that ECM required less nodes to get to depth D, but
>there is no correct solution data given.  I question this.  You took pains to
>present this data in other cases, but it is absent here.  Those numbers would
>have been very interesting.
>WCS is another strange suite, and everything said about the Neishtadt suite can
>be said here.  There appear to be at least 150 mates in the suite.  Everything
>said about the Neishtadt results can be said about these results.
>The mates from the CAP data are the same kind of thing.
>It is as if you've decided what VR=3 can do best, and you are matching it
>against what R=2 is not known to do well.  For some reason, you found three
>suites loaded up with mates, and provided solution data.  Solution data is not
>provided for ECM, a harder suite that contains fewer direct mates.
>The most compelling evidence is the autoplay match where VR=3 scored 68.5%.
>These games are not available online.  I was going to check to see if the
>programs got into a rut and played the same game over and over again, but I
>can't do that.
>Assuming that they played 100 unique games, the question remains as to whether
>68.5% proves anything.  You can say, of course it does, but the real answer has
>to do with statistics.  There is no way that a "real" scientific journal would
>accept "of course it does" as an answer -- they'd want the math.  You don't
>provide the math.
>What are the odds that this result was due to chance?  The paper does not say,
>and unless I wish to speculate, I can draw no conclusion from this other than
>that it seems obvious that there is better than a 50% chance that VR=3 is better
>than R=2.
>Match result math is rarely if ever done in the computer chess field.  Figuring
>out how to do this would be a *great* JICGA article, and it's amazing that
>nobody has felt the need to do this until now.  Being able to make positive
>statements about match scores would be worth something, you'd think, but 40
>years into computer chess research nobody has published this.

I did, some 15-20 years ago, in the Swedish "PLY" a couple of articles that
later became the basics for the SSDF testing.
A year or so ago you posted a question about how to interpret results with very
few games. In a another thread I posted a new theory for this as an answer
"Match results - a complete(!) theory (long)".
I also made a program to use for this that can be found at Dann's ftp site.

