Author: Robert Hyatt
Date: 17:31:30 01/13/98
Go up one level in this thread
On January 13, 1998 at 15:49:58, Don Dailey wrote: >>This is not perfect. I should probably normalize the numbers somehow so >>that problems in which both of the programs finish very close to the >>maximum allowable times don't get more weight than those in which >>neither of them can quite finish that last ply. Also, this doesn't take >>into account that one of the versions might be getting closer to the >>real answer, and therefore is taking more time per ply. And finally, I >>have had a problem with disk caching -- the second run on any given >>night usually goes faster than the first one, so when I run these >>suites, some of the results are a little bogus. > >All of this stuff is a mess. I don't think the way problem sets are >typically scored make much sense. They should give credit for quicker >solutions in my opinion not just total solved in less than x minutes. > >I believe the solution times should be an important factor. The tests >should be run long enough so that getting a solution late gives very >little credit and is basically equivalent to not solving it at all. >This is not a perfect solution either but helps with the phenomenon of >solving 1 second later than the specified time. Ideally you should be >required to solve every problem but this is not a practical solution. >There should also be a minimum >solution time of something like 1 second because of i/o problems. My >program for instance may solve a simple problem in 0.1 or 0.2 seconds >randomly. I would normally give a lot of weight to solving something >twice as fast but not in this case. > >If everyone agreed on a simple but more sensible method of scoring any >problem set we could talk and compare numbers more meaningfully than >we do now. I'm not saying there wouldn't still be problems though, >there is the issue of do you wait to see if it keeps the solution, >are there multiple solutions etc. > >Another problem with more complex scoring methods is that until people >understand them, the numbers are even more ambiguous to people. Saying >I solve 240 Win at Chess in less than 2 minutes is at least something >you can understand immediately. But too much information is thrown >away. > >But you get the idea. If I make the program 10 percent faster with >no other side effects it might not show up at all in some problem >set unless it just happens to pick up a problem or two. > >- Don I always use the classic numerical analysis idea of "sum of squares" here. I take each solution time, square it, and sum all the times together. This favors solving the hard problems faster than it does solving the easy ones... because 10 seconds off a 60 second time is quite significant, as opposed to going from 11 to 1 second. I look at other numbers too, but this sum of squares is a good, quick, first approximation to quantify the timing result of the changes...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.