# Computer Chess Club Archives

## Messages

### Subject: Re: Proving something is better

Author: Omid David Tabibi

Date: 17:44:45 12/17/02

Go up one level in this thread

```On December 17, 2002 at 19:48:42, Dann Corbit wrote:

>On December 17, 2002 at 19:42:10, Bruce Moreland wrote:
>
>>On December 17, 2002 at 19:10:42, Dann Corbit wrote:
>>
>>>I think perhaps a good measure of ability would be to take a set such as WAC and
>>>normalize it with a good engine on a platform of known strength.  The time to
>>>complete would be (perhaps) 5 seconds per position, and the square root of the
>>>sum of the time squared would be used as a measure.
>>>
>>>Let's suppose that on a 1GHz machine, Crafty solves 297/300 and that the square
>>>root of the sum of the time squared was 300.  If two program solve an equal
>>>number of problems, then we use the time for a measure of goodness.  If not,
>>>then the number of solutions will be more important.
>>>Now, we will have a test that should be fairly reproducible.  Repeat this test
>>>procedure for a dozen or so test sets.
>>>
>>>After all, when playing chess, two things are important:
>>>2.  Getting it fast.
>>>
>>>If other programs were tested under a similar setup, we might find some
>>>interesting results.  For instance, if one program averages 1/10 of a second to
>>>solve problems, even though it solves the same number, it would probably
>>>dominate over a program that takes 1 second on average to solve them.  Of
>>>course, it might not scale cleanly to longer time controls, but it seems nobody
>>>has the patience to test them like that.
>>>
>>>I suggest taking the square root of the sum of the squares to reduce the effect
>>>of sports that are abnormal either in quickness or slowness to solve.  Then the
>>>general ability will be more clearly seen.  A straight arithmetic average could
>>>easily be bent by outliers.
>>
>>I think that this is diverting, mostly.
>>
>>Let's stipulate for the moment that getting more answers in less time is *proof*
>>that a version is better tactically.
>
>It is really proof that the test set of problems is solved faster.  To assume
>that the program is better tactically with only one set of problems is (I think)
>a serious mistake.
>
>>The way Omid did his test, you can't tell
>>the new version is better, because he didn't provide the right numbers.  We
>>don't know if it got more answers in less time than the R=3 version.
>
>We know less nodes.  So we can say:
>"The new version solves this test set in less nodes."
>Nothing more, nothing less.
>
>>We have his new version, and it gets to the same depth more slowly, and finds
>>more answers, than R=3.  This proves nothing.  I could make a program where the
>>eval function incorporates a 2-ply search.  It would take longer to search 9
>>plies, but it would get a lot more right.  This is the same result that Omid
>>got.  Did he just prove that my hypothetical program is better?  Of course not.
>>
>>If you accept his method as proof, he did prove that VR=3 is better than R=2, I
>>point out.  But he should have tackled R=3, too, if he is going to present that
>>data.
>
>No, he does not have to prove that unless he states that condition.  IOW, if he
>makes a statement about the experimental outcome, he should provide data to back
>it up.  If he fails to provide data, then the statement is wild extrapolation.
>
>I don't remember if he claimed that VR=3 was better than R=3.  If he did state
>that and failed to provide data, then it is an unverified assumption.

Heinz' experiments showed that std R=3 is weaker than std R=2 [1]. Bruce's
Ferret also used std R=2 in WCCC 1999 [2]. So I took the one which is believed
to be stronger (std R=2), and showed that vrfd R=3 is superior to it.

[1] Heinz, E.A. (1999). Adaptive null-move pruning, ICCA Journal, Vol. 22, No.
3, pp. 123--132.
[2] Feist, M. (1999). The 9th World Computer-Chess Championship: Report on the
tournament. ICCA Journal, Vol. 22, No. 3, pp. 155--164.

```