Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: More on the "bad math" after an important email...

Author: Robert Hyatt
Date: 15:22:37 09/03/02
On September 03, 2002 at 17:49:33, Georg v. Zimmermann wrote:

>Hallo,
>
>I am afraid I have to say i dont like this at all. IMHO it is a very bad idea to
>include "total nodes searched" that are not total nodes searched in a scientific
>article.
>I do not see why you did not just include speedup factors then.

Two points.  I originally only included "speedup data".  but referees wanted
nodes and times as well because these numbers had been published in other
tests.  I pointed out that my test was not to fixed depth, so that nodes
could not be counted.  We agreed on the "computed nodes" as a perfectly
acceptable way since I ran a couple of tests and showed that the actual
nodes searched and the computed nodes searched were very close.  As in
a fraction of a percent...

If you think the nodes searched and times are more important than the
actual speedups, then there is little more I can add.  Neither of us are
sure how the times were recorded, so they might be raw or they might be
computed, since we already had to compute the nodes.  But as far as the
crux of the paper, the DTS algorithm, and its speedup, the numbers given
are absolutely accurate.  And, were it possible to produce good node counts,
you would see that they would be completely accurate as well.  The hardware
produces more variability with memory conflicts than the computation affected
the node counts...


>
>And could you ellaborate on why you couldnt do node counts ? I do know next to
>nothing about multi-processor search, so I dont understand why you cant simply
>do positionsSearched[processor]++; and later add them all up, or something
>similar. AFAIK all other Deep Something programs report node counts even when
>they dont get 100% cpu time ?


Sure..  Say I do a search to 10 plies and get a PV at 1 minute.  I search
another 3 minutes and give up.  I _then_ print the nodes searched.  But
those nodes have nothing to do with the time needed to produce the PV.

What node count do I use?  I can't display node counts "on the fly" because
the nodes are kept in separate data structures, thread by thread, and a
thread can have multiple split points and a data structure for each with a
node count in it.  Which ones to add up?  Which have _already_ been added up?

I can't tell until the search ends.  But in this test the search never "ended"
at a clean point.  That was why we had the node count problem then.  That is
also why I don't/can't display node counts on the fly now.  On an SMP machine,
if you run Crafty and type "." you will get a node count.  wait a few secs,
and type "." again.  You might get the same number.  You might get a larger
number that represents _all_ of the nodes searched, or just a part of them.
It is the way the DTS algorithm works, and it is the way the DTS variant I
use in Crafty also works, sometimes to my dissatisfaction...






>
>Finally I would like to say that I appreciate Dr.Hyatts work a lot, and however
>questionable this one article might be - or might not be, not all that gold is
>glitters - I would probably never had as much fun with computer chess without
>him helping the guys who helped me understand things. :)
>
>Georg
>
>
>
>
>On September 03, 2002 at 17:30:35, Robert Hyatt wrote:
>
>>As is usually the case, someone that helped me with this had sent an email
>>while I was responding to the other posts.  And when I read the second
>>paragraph, it all "came back".
>>
>>Here is the issue:
>>
>>I started with a 16 cpu game log file.  Note that this was from a real
>>game.  And in it I would find output just like Crafty's...  Here is the
>>idea:
>>
>>      depth    time    eval    PV
>>
>>followed by a summary.
>>
>>The problem is that the node count in the summary has nothing to do with the
>>PV when it was displayed.  The program _could_ have stopped the search as soon
>>as the PV was displayed, or it could have stopped the search minutes later.
>>As a result, I had no real node counts for the 16 cpu test that could be
>>compared to anything else since there was no way to know when the 16 cpu
>>test completed.
>>
>>We chose to do the following:
>>
>>1.  run the positions thru a one processor search, and since there was no
>>parallel searching going on, we could display an _exact_ node count for the
>>one-processor test, as it would have been had the search stopped immediately
>>after producing the critical PV move at the final depth.  That value _is_ a
>>raw data point.
>>
>>2.  We then ran the positions thru the 2-processor search, taking the time
>>for the same PV as the time.  All the times are pure raw data, exactly.  But
>>we couldn't get a good node count.  What we chose to do was to use an internal
>>performance monitor we had built in, that very precisely told us how much cpu
>>time had been spent playing chess by each processor.  From these times, we
>>computed speedups for 2 processors, 4, 8 and 16 (we didn't run the 16 cpu test
>>again, we just used the raw log from the mchess pro game...
>>
>>3.  We now had a set of speedups for each test.  Which we plugged into the
>>article.  And again, it is important to note that for this data, the raw
>>speedup was computed by dividing the times as you would expect.
>>
>>For the node counts, which was impossible for us to obtain from any but the
>>one processor test, we simply extrapolated them based on the cpu utilization
>>of all the processors.  Some simple testing by searching to a fixed depth on
>>one processor and then 16 processors shows that our "extrapolation" was "right
>>on"...  and we used those node counts.
>>
>>4.  Clearly, the node counts are therefore produced from the raw 1-cpu data,
>>multiplied by the percent of cpu utilization for the 2,4,8 and 16 cpu test
>>cases.  So they should correlate 100%.
>>
>>The only thing that my (nameless) partner said was that he could not remember
>>if we did the same thing to produce the times since it would have been easier
>>than trying to extract them from the logs later to produce the table for times.
>>He "thought" that the times were added after a request from a referee, so that
>>is possible.
>>
>>So, perhaps the data has some questionable aspects to it.  The only part that
>>I am _certain_ is "raw data" is the individual speedup values, because that is
>>what we were looking at specifically.  I had not remembered the node count
>>problem until this email came in and then I remembered a case where Vincent
>>was trying to prove something about crafty and got node counts suggesting that
>>it should have gotten a > 2.0 speedup.  I had pointed out that the way I do
>>nodes, it is impossible to produce them anywhere except when all processors are
>>idle, if you want an accurate number.  I _should_ have remembered that we had
>>the same problem back then.  I am therefore afraid that the times might have
>>been computed in the same way since it would have been quite natural to do
>>so...
>>
>>I don't think this changes one iota about what is going on, of course.  as
>>given a speedup, and total time used by Crafty, I can certainly compute a
>>node count that will be _very_ close to the real one.  Which I supposed I should
>>add so that Vincent can have his "every time the PV changes give me nodes"
>>type of value.
>>
>>Keep in mind that this was an email from someone that worked on this with me
>>back then.  His memory was somewhat better because he actually wrote the code
>>to solve the problem.  But again, he was _very_ vague in remembering everything.
>>It took a phone call for us to discuss this to get as far as I did above.  I
>>might remember more as time goes on.
>>
>>But the bottom line is "trust the speedup numbers explicitly".  And if you
>>trust them, the others can be directly derived from them.  For 16 cpus, Cray
>>Blitz generally searched 100% of the time on each cpu.  If it produced a speedup
>>of 16, then each cpu searched 1/16th the total nodes searched by one processor.
>>If it produced a speedup of 8, then each cpu searched 1/8 of the nodes searched
>>by one processor, which is 2x the total nodes, aka search overhead.
>>
>>Sorry for the confusion.  Stuff done 10 years ago is difficult enough.
>>Remembering the "log eater" was harder since I didn't write all of it...
>>
>>Bob
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.