Author: Bruce Moreland
Date: 11:49:49 01/13/98
Go up one level in this thread
On January 13, 1998 at 14:08:00, Dan Homan wrote: >Great suggestion Bruce, thanks. > >The hardest part is evaluating the changes I make, and these tools you >suggest sound like a very good idea. My technique for evaluating >changes up to this point has been either a) run WAC by hand noting any >obvious changes in solution times/scores, but this is difficult to >do precisely or b) let the program play on FICS for a while and >observe the games. Any technique is flawed but yours sound like it >at least provides quantitative (and diverse) information on which to >make a decision. It is a good idea to automate your test suite thing. My program is passed test suite name and time information via the command line, and when it finishes a suite it quits, which lets me run the program several times via a batch file. I number the output files according to which version created the file, and I number the executables too, so I can easily figure out what I've already done, and I can do something new, even with a very old version, very conveniently. I'm just making use of a few semi-truths: 1) If two programs search the same tree and produce the same result, the one that does it faster is stronger. 2) If you are trying for a strength increase via point #1, but you notice that the tree has changed as a result, it is likely that a bug is involved. 3) If you make a change to extensions or pruning, it's worth checking against a large and diverse tactical suite, as well as checking to see what effect your change has on tree size in positional cases. There are cases that are hard to test. If I increase my doubled pawn penalty, I'll run suites in order to establish a new baseline, for comparison later, but it doesn't really matter whether it solves a few more or less on a big tactical suite, or whether the program gets to depth D a little faster or slower. This change isn't going to have this kind of impact legitimately, so what I'm seeing is probably just noise. If there is a *huge* impact from a simple change, it's worth investigating though, since you may have completely wrecked something. I don't know how to verify small changes. I use the qualitative method, like you do (watching games on a server), but this is flawed too. You can get a mistaken impression easily, I think. But I think it's almost useless to watch for rating point changes when you make a small (or even large) change, since the ratings on the chess servers are pretty random. I've seen more than 200 point variance with the same version if you leave it on for a few days, and it is hard to imagine that a change in a chess program, unless it introduced or fixed a really massive bug, could be detected against that. bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.