Author: Don Dailey
Date: 16:13:10 01/13/98
Go up one level in this thread
On January 13, 1998 at 16:53:46, Bruce Moreland wrote: > >On January 13, 1998 at 15:49:58, Don Dailey wrote: > >>>This is not perfect. I should probably normalize the numbers somehow so >>>that problems in which both of the programs finish very close to the >>>maximum allowable times don't get more weight than those in which >>>neither of them can quite finish that last ply. Also, this doesn't take >>>into account that one of the versions might be getting closer to the >>>real answer, and therefore is taking more time per ply. And finally, I >>>have had a problem with disk caching -- the second run on any given >>>night usually goes faster than the first one, so when I run these >>>suites, some of the results are a little bogus. >> >>All of this stuff is a mess. I don't think the way problem sets are >>typically scored make much sense. They should give credit for quicker >>solutions in my opinion not just total solved in less than x minutes. > >I'll try again. My first response to this was chopped off due to lag. > >This is not how I am using this. I am not checking times to solution, >at all. I am comparing time to finish the last ply that both versions >finished. > >If version A finishes 8 plies in a position, and version B finishes 9, I >compare the times taken to finish 8 plies. > >If version A and version B don't differ very much, and you use a suite >that is large enough, and positional enough, this can tell you which is >faster. > >I would mainly do this if I am trying to figure out if a performance >change really increased performance. > >Some people try to figure out if they have gotten faster by looking at >nodes per second, which is wrong, because doing more nodes says nothing >about whether you've also decreased efficiency. > >If I add a new extension, I will still collect this information, because >I think it is interesting to know what effect my extension has had upon >overall search depth. A version that takes twice as long, on average, >to get to depth D, might be a little suspect unless it is somehow >solving *everything* faster. > >But I think that testing like thisis especially useful when you are >trying to figure out if you have improved move ordering, or have made a >significant performance change somehow. > >bruce Hi Bruce, I do pretty much the same thing but I'm speaking in more general terms of better ways to score problem sets to compare with others. I'm basically "fishing" for a better way, I probably won't change the way I do my own testing because it works well for me. - Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.