Author: Bruce Moreland
Date: 10:00:40 01/13/98
Go up one level in this thread
On January 13, 1998 at 09:08:49, Dan Homan wrote: >I can think of a few reasons that my search would have more nodes, but >none of them should be a factor of 3. One thing that did occur to me >is extensions. I turned off all extensions and researched this >position. In doing so, I found that I searched only about 30% of the >nodes to reach the same depth as the above example. Should my >extensions really be tripling the size of my search tree? There are some things that are hard to measure and some things that are easy to measure. I think that I can suggest how to measure some of the easy things. I have a utility that reads my log files, figures out how deep the program got for each problem, and will compare this depth, and the time taken to attain it, with a log file produced by another version. For instance, if version A got 9 plies, and version B got 8 plies, it will tell me this, and it will also tell me how long each of them took to get 8 plies, and will produce a ratio expressed as a percentage. It accumulates some of this information. In a two-problem suite, if both A and B got through 8 plies in the first problem, and it took A 12 seconds and B 14 seconds, and both got through ply 12 in the second problem, A taking 22 seconds and B taking 24 seconds, I'll total this up, and get 34 seconds for A and 38 seconds for B. I'll output both of these numbers and the ratio between them, and conclude that B is 12% slower than A. This is not perfect. I should probably normalize the numbers somehow so that problems in which both of the programs finish very close to the maximum allowable times don't get more weight than those in which neither of them can quite finish that last ply. Also, this doesn't take into account that one of the versions might be getting closer to the real answer, and therefore is taking more time per ply. And finally, I have had a problem with disk caching -- the second run on any given night usually goes faster than the first one, so when I run these suites, some of the results are a little bogus. But even so, I can use this tool. If I want to examine the effects of some move ordering change, I can run a lage suite for a reasonable amount of time per problem, and get a number that represents a rough guess about whether I made things better or worse. There is another benefit as well. This program looks at the node counts taken to complete every ply, and it compares them, and outputs *any* differences between the two versions in this respect. So if all I did was try to make the program go a little faster, without changing any semantics, I can easily look for node count changes between two versions, which are almost certainly bugs. So, to summarize this tool, it lets me get a rough idea if I've sped the program up, and lets me find bugs if I have tried to make a performance change. I have another tool that lets me evaluate the results of tactical suites. I can "grep" my log files for the string "Success", which will let me know how many problems I've gotten right with version B, and compare these number with the number of problems I got right with version A. I'm sure a lot of people do this, but I've taken it a step further. I have a tool that will tell me how many were right after 1 second, 2 seconds, etc., out to the duration of the test run. I can look at this with my eyeballs or I can chart it with Excel. The shape of this curve is very interesting. Often, two versions will solve the same number, but if you look at a chart of these intermediate results, you will be able to choose between them, because one will get a tremendous lead, and the other will slowly catch up until at the end they are equal. In this case, the one that gets answers faster is better. I have another tool which does nothing but output time to solve for each problem for each version. I can load this into Excel and chart this, and notice things like one program solving problem 374 in 1 second while the other one solves it in 38 seconds. Sometimes there is an interesting explanation for this. So, if you had built these tools for yourself, and were willing to make a lot of sub-versions that you could test every day (I usually make 3 or 4 versions a day), you could evaluate the effect of these extensions, together and in isolation. I get the idea that some people target a specific problem, and if they can mess with things until they get that problem faster, they call it good. I don't do this. I would much rather miss one problem than solve everything else 25% slower. So *every* time I mess with search extensions or pruning I run one of these suites and try to understand what the change actually did to overall ability to solve tactical problems. If something looks good, I'll run more suites the next day, just to make sure. I can't understand how anyone can survive without these tools, actually. Do you get more correct answers in the same time when you add all of your extensions in? Do any of them make your scores increase if you remove them? Can you mess with specific extensions to make them go faster without losing solutions? bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.