Author: Omid David Tabibi
Date: 17:44:45 12/17/02
Go up one level in this thread
On December 17, 2002 at 19:48:42, Dann Corbit wrote: >On December 17, 2002 at 19:42:10, Bruce Moreland wrote: > >>On December 17, 2002 at 19:10:42, Dann Corbit wrote: >> >>>I think perhaps a good measure of ability would be to take a set such as WAC and >>>normalize it with a good engine on a platform of known strength. The time to >>>complete would be (perhaps) 5 seconds per position, and the square root of the >>>sum of the time squared would be used as a measure. >>> >>>Let's suppose that on a 1GHz machine, Crafty solves 297/300 and that the square >>>root of the sum of the time squared was 300. If two program solve an equal >>>number of problems, then we use the time for a measure of goodness. If not, >>>then the number of solutions will be more important. >>>Now, we will have a test that should be fairly reproducible. Repeat this test >>>procedure for a dozen or so test sets. >>> >>>After all, when playing chess, two things are important: >>>1. Getting the right answer. >>>2. Getting it fast. >>> >>>If other programs were tested under a similar setup, we might find some >>>interesting results. For instance, if one program averages 1/10 of a second to >>>solve problems, even though it solves the same number, it would probably >>>dominate over a program that takes 1 second on average to solve them. Of >>>course, it might not scale cleanly to longer time controls, but it seems nobody >>>has the patience to test them like that. >>> >>>I suggest taking the square root of the sum of the squares to reduce the effect >>>of sports that are abnormal either in quickness or slowness to solve. Then the >>>general ability will be more clearly seen. A straight arithmetic average could >>>easily be bent by outliers. >> >>I think that this is diverting, mostly. >> >>Let's stipulate for the moment that getting more answers in less time is *proof* >>that a version is better tactically. > >It is really proof that the test set of problems is solved faster. To assume >that the program is better tactically with only one set of problems is (I think) >a serious mistake. > >>The way Omid did his test, you can't tell >>the new version is better, because he didn't provide the right numbers. We >>don't know if it got more answers in less time than the R=3 version. > >We know less nodes. So we can say: >"The new version solves this test set in less nodes." >Nothing more, nothing less. > >>We have his new version, and it gets to the same depth more slowly, and finds >>more answers, than R=3. This proves nothing. I could make a program where the >>eval function incorporates a 2-ply search. It would take longer to search 9 >>plies, but it would get a lot more right. This is the same result that Omid >>got. Did he just prove that my hypothetical program is better? Of course not. >> >>If you accept his method as proof, he did prove that VR=3 is better than R=2, I >>point out. But he should have tackled R=3, too, if he is going to present that >>data. > >No, he does not have to prove that unless he states that condition. IOW, if he >makes a statement about the experimental outcome, he should provide data to back >it up. If he fails to provide data, then the statement is wild extrapolation. > >I don't remember if he claimed that VR=3 was better than R=3. If he did state >that and failed to provide data, then it is an unverified assumption. Heinz' experiments showed that std R=3 is weaker than std R=2 [1]. Bruce's Ferret also used std R=2 in WCCC 1999 [2]. So I took the one which is believed to be stronger (std R=2), and showed that vrfd R=3 is superior to it. [1] Heinz, E.A. (1999). Adaptive null-move pruning, ICCA Journal, Vol. 22, No. 3, pp. 123--132. [2] Feist, M. (1999). The 9th World Computer-Chess Championship: Report on the tournament. ICCA Journal, Vol. 22, No. 3, pp. 155--164.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.