Author: Don Dailey
Date: 16:04:12 01/13/98
Go up one level in this thread
Hi Bruce, It sounds like you do almost every the same way I do. I am also not impressed with getting a single problem if it hurts the others a little. I have a file of solutions depths and times for just about any version I'm interested in and have a utility that reads my output files and compares them. I take the difference for each problem, not the total times. I average the differences together. This way each problem is given equal weight. If you take total time for the whole problem set this will often be dominated by one or two of the longest problems. I have heard people talk about individual problems like solving it would be a major breakthrough for them. It's fun to do and I've done it myself but I do not take this very seriously unless my improvement generalizes well to many problems (or is an incredibly small slowdown.) I never forget that a small 5% slowdown makes the program slower (weaker) in all positions where the improvement doesn't help. But trying to solve the tough ones is a good excercise and teaches us what does and doesn't work. Here is another tool I find quite valuable. When I make a positional improvement to the program I like to know which positions are changed. I run self-tests at fixed depths, each opening getting played once for each side. I have a utility I call gamediff which compares the PGN output of the test games and identifies every position where the move choice varied and creates a fen position. I can go through the fen positions one by one and quickly see the side effects and problems of the new change. It's always enlightening. I learn more about getting a heuristic right this way that any other thing. - Don On January 13, 1998 at 13:00:40, Bruce Moreland wrote: > >On January 13, 1998 at 09:08:49, Dan Homan wrote: > >>I can think of a few reasons that my search would have more nodes, but >>none of them should be a factor of 3. One thing that did occur to me >>is extensions. I turned off all extensions and researched this >>position. In doing so, I found that I searched only about 30% of the >>nodes to reach the same depth as the above example. Should my >>extensions really be tripling the size of my search tree? > >There are some things that are hard to measure and some things that are >easy to measure. I think that I can suggest how to measure some of the >easy things. > >I have a utility that reads my log files, figures out how deep the >program got for each problem, and will compare this depth, and the time >taken to attain it, with a log file produced by another version. > >For instance, if version A got 9 plies, and version B got 8 plies, it >will tell me this, and it will also tell me how long each of them took >to get 8 plies, and will produce a ratio expressed as a percentage. > >It accumulates some of this information. In a two-problem suite, if >both A and B got through 8 plies in the first problem, and it took A 12 >seconds and B 14 seconds, and both got through ply 12 in the second >problem, A taking 22 seconds and B taking 24 seconds, I'll total this >up, and get 34 seconds for A and 38 seconds for B. I'll output both of >these numbers and the ratio between them, and conclude that B is 12% >slower than A. > >This is not perfect. I should probably normalize the numbers somehow so >that problems in which both of the programs finish very close to the >maximum allowable times don't get more weight than those in which >neither of them can quite finish that last ply. Also, this doesn't take >into account that one of the versions might be getting closer to the >real answer, and therefore is taking more time per ply. And finally, I >have had a problem with disk caching -- the second run on any given >night usually goes faster than the first one, so when I run these >suites, some of the results are a little bogus. > >But even so, I can use this tool. If I want to examine the effects of >some move ordering change, I can run a lage suite for a reasonable >amount of time per problem, and get a number that represents a rough >guess about whether I made things better or worse. > >There is another benefit as well. This program looks at the node counts >taken to complete every ply, and it compares them, and outputs *any* >differences between the two versions in this respect. So if all I did >was try to make the program go a little faster, without changing any >semantics, I can easily look for node count changes between two >versions, which are almost certainly bugs. > >So, to summarize this tool, it lets me get a rough idea if I've sped the >program up, and lets me find bugs if I have tried to make a performance >change. > >I have another tool that lets me evaluate the results of tactical >suites. I can "grep" my log files for the string "Success", which will >let me know how many problems I've gotten right with version B, and >compare these number with the number of problems I got right with >version A. > >I'm sure a lot of people do this, but I've taken it a step further. I >have a tool that will tell me how many were right after 1 second, 2 >seconds, etc., out to the duration of the test run. I can look at this >with my eyeballs or I can chart it with Excel. The shape of this curve >is very interesting. Often, two versions will solve the same number, >but if you look at a chart of these intermediate results, you will be >able to choose between them, because one will get a tremendous lead, and >the other will slowly catch up until at the end they are equal. In this >case, the one that gets answers faster is better. > >I have another tool which does nothing but output time to solve for each >problem for each version. I can load this into Excel and chart this, >and notice things like one program solving problem 374 in 1 second while >the other one solves it in 38 seconds. Sometimes there is an >interesting explanation for this. > >So, if you had built these tools for yourself, and were willing to make >a lot of sub-versions that you could test every day (I usually make 3 or >4 versions a day), you could evaluate the effect of these extensions, >together and in isolation. > >I get the idea that some people target a specific problem, and if they >can mess with things until they get that problem faster, they call it >good. I don't do this. I would much rather miss one problem than solve >everything else 25% slower. So *every* time I mess with search >extensions or pruning I run one of these suites and try to understand >what the change actually did to overall ability to solve tactical >problems. If something looks good, I'll run more suites the next day, >just to make sure. > >I can't understand how anyone can survive without these tools, actually. > >Do you get more correct answers in the same time when you add all of >your extensions in? Do any of them make your scores increase if you >remove them? Can you mess with specific extensions to make them go >faster without losing solutions? > >bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.