Author: Robert Hyatt
Date: 14:30:35 09/03/02
As is usually the case, someone that helped me with this had sent an email
while I was responding to the other posts. And when I read the second
paragraph, it all "came back".
Here is the issue:
I started with a 16 cpu game log file. Note that this was from a real
game. And in it I would find output just like Crafty's... Here is the
idea:
depth time eval PV
followed by a summary.
The problem is that the node count in the summary has nothing to do with the
PV when it was displayed. The program _could_ have stopped the search as soon
as the PV was displayed, or it could have stopped the search minutes later.
As a result, I had no real node counts for the 16 cpu test that could be
compared to anything else since there was no way to know when the 16 cpu
test completed.
We chose to do the following:
1. run the positions thru a one processor search, and since there was no
parallel searching going on, we could display an _exact_ node count for the
one-processor test, as it would have been had the search stopped immediately
after producing the critical PV move at the final depth. That value _is_ a
raw data point.
2. We then ran the positions thru the 2-processor search, taking the time
for the same PV as the time. All the times are pure raw data, exactly. But
we couldn't get a good node count. What we chose to do was to use an internal
performance monitor we had built in, that very precisely told us how much cpu
time had been spent playing chess by each processor. From these times, we
computed speedups for 2 processors, 4, 8 and 16 (we didn't run the 16 cpu test
again, we just used the raw log from the mchess pro game...
3. We now had a set of speedups for each test. Which we plugged into the
article. And again, it is important to note that for this data, the raw
speedup was computed by dividing the times as you would expect.
For the node counts, which was impossible for us to obtain from any but the
one processor test, we simply extrapolated them based on the cpu utilization
of all the processors. Some simple testing by searching to a fixed depth on
one processor and then 16 processors shows that our "extrapolation" was "right
on"... and we used those node counts.
4. Clearly, the node counts are therefore produced from the raw 1-cpu data,
multiplied by the percent of cpu utilization for the 2,4,8 and 16 cpu test
cases. So they should correlate 100%.
The only thing that my (nameless) partner said was that he could not remember
if we did the same thing to produce the times since it would have been easier
than trying to extract them from the logs later to produce the table for times.
He "thought" that the times were added after a request from a referee, so that
is possible.
So, perhaps the data has some questionable aspects to it. The only part that
I am _certain_ is "raw data" is the individual speedup values, because that is
what we were looking at specifically. I had not remembered the node count
problem until this email came in and then I remembered a case where Vincent
was trying to prove something about crafty and got node counts suggesting that
it should have gotten a > 2.0 speedup. I had pointed out that the way I do
nodes, it is impossible to produce them anywhere except when all processors are
idle, if you want an accurate number. I _should_ have remembered that we had
the same problem back then. I am therefore afraid that the times might have
been computed in the same way since it would have been quite natural to do
so...
I don't think this changes one iota about what is going on, of course. as
given a speedup, and total time used by Crafty, I can certainly compute a
node count that will be _very_ close to the real one. Which I supposed I should
add so that Vincent can have his "every time the PV changes give me nodes"
type of value.
Keep in mind that this was an email from someone that worked on this with me
back then. His memory was somewhat better because he actually wrote the code
to solve the problem. But again, he was _very_ vague in remembering everything.
It took a phone call for us to discuss this to get as far as I did above. I
might remember more as time goes on.
But the bottom line is "trust the speedup numbers explicitly". And if you
trust them, the others can be directly derived from them. For 16 cpus, Cray
Blitz generally searched 100% of the time on each cpu. If it produced a speedup
of 16, then each cpu searched 1/16th the total nodes searched by one processor.
If it produced a speedup of 8, then each cpu searched 1/8 of the nodes searched
by one processor, which is 2x the total nodes, aka search overhead.
Sorry for the confusion. Stuff done 10 years ago is difficult enough.
Remembering the "log eater" was harder since I didn't write all of it...
Bob
This page took 0.04 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.