### Subject: Re: Proving something is better

Author: Bruce Moreland

Date: 23:35:47 12/18/02

```On December 19, 2002 at 00:58:30, Omid David Tabibi wrote:

>Based on the presented data:
>
>Isn't it clear that vrfd R=3 is superior to std R=2 ?

No, but it is likely.

The Neishtadt suite is an odd choice since it contains a great many checkmate
combinations.  I don't accept that this is a primary component of chess program
strength.  I accept that VR=3 did better than R=2 on this test set, since the
number of solutions found was greater in less time.

There is a table that shows that ECM required less nodes to get to depth D, but
there is no correct solution data given.  I question this.  You took pains to
present this data in other cases, but it is absent here.  Those numbers would
have been very interesting.

WCS is another strange suite, and everything said about the Neishtadt suite can
be said here.  There appear to be at least 150 mates in the suite.  Everything

The mates from the CAP data are the same kind of thing.

It is as if you've decided what VR=3 can do best, and you are matching it
against what R=2 is not known to do well.  For some reason, you found three
suites loaded up with mates, and provided solution data.  Solution data is not
provided for ECM, a harder suite that contains fewer direct mates.

The most compelling evidence is the autoplay match where VR=3 scored 68.5%.
These games are not available online.  I was going to check to see if the
programs got into a rut and played the same game over and over again, but I
can't do that.

Assuming that they played 100 unique games, the question remains as to whether
68.5% proves anything.  You can say, of course it does, but the real answer has
to do with statistics.  There is no way that a "real" scientific journal would
accept "of course it does" as an answer -- they'd want the math.  You don't
provide the math.

What are the odds that this result was due to chance?  The paper does not say,
and unless I wish to speculate, I can draw no conclusion from this other than
that it seems obvious that there is better than a 50% chance that VR=3 is better
than R=2.

Match result math is rarely if ever done in the computer chess field.  Figuring
out how to do this would be a *great* JICGA article, and it's amazing that
nobody has felt the need to do this until now.  Being able to make positive
statements about match scores would be worth something, you'd think, but 40
years into computer chess research nobody has published this.

>Isn't it clear that vrfd R=3 is superior to std R=3 ?

Maybe.  Everything that is said about the R=2 tests can be said here.  The
suites are weird.  But in addition, R=3 did not terribly worse on the suites,
and did it in significantly less time.

It is unknown if it would have exceeded the results for VR=3, had it been given
as much time as VR=3 received.

In the article, there is no game-play data for VR=3 vs R=3, so there is no
evidence there.

It is possible to ask an additional question, based upon data in the article,
about whether R=3 is better than R=2.  That seems likely, based upon the data
presented, but once again, the suites are weird, so it is hard to accept
anything as proof of anything.

bruce

