Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: DIEP parallel in Paderborn - technical and detailed story

Author: Robert Hyatt
Date: 07:47:29 06/30/99
On June 30, 1999 at 05:36:27, Vincent Diepeveen wrote:

>On June 29, 1999 at 14:10:26, Robert Hyatt wrote:
>
>>On June 29, 1999 at 10:19:32, Vincent Diepeveen wrote:
>>
>>>On June 29, 1999 at 09:56:48, Robert Hyatt wrote:
>>>
>>>>On June 29, 1999 at 09:07:40, Ernst A. Heinz wrote:
>>>>
>>>>>On June 28, 1999 at 18:19:59, Vincent Diepeveen wrote:
>>>>>
>>>>>> [...] and that's kind of tough
>>>>>>in the parallellism i use (which is an improved cray blitz version).
>>>>>
>>>>>Vincent,
>>>>>
>>>>>Your above statement is most amusing. Nobody else would ever call his
>>>>>buggy and not even deadlock-free parallel search an "improvement".
>>>>>Why don't you first establish the soundness of your implementation
>>>>>before measuring its "phenomenal" effectiveness ...
>>>>>
>>>>>You surely know that anything is deducible from unsound hypotheses.
>>>>>
>>>>>> [...]
>>>>>>
>>>>>>The speedup of the 4x400Mhz xeon with 450Mb hash compared to
>>>>>>a PII-450 NT is about 5 times for DIEP, as the speedup of 4 processors
>>>>>>for DIEP is 4.06 . The more difficult a position is the
>>>>>>bigger the speedup. Speedups over 10 times at 4 processors are not an
>>>>>>exception for DIEP. Now this is theoretical not possible will some say.
>>>>>>Dead wrong. It is.
>>>>>
>>>>>So-called "superlinear speed-up" due to cache and memory effects is indeed
>>>>>possible but highly unlikely in the case of parallelizing a *sophisticated*
>>>>>alpha-beta searcher. However, "Diep" may just not be so super-sophisticated
>>>>>as you constantly claim. Do you also get such extreme improvements for your
>>>>>sequential program version when adding processor cache and main memory on
>>>>>a single-CPU system?
>>>>>
>>>>>The most likely explanation for your speed-up observations is that your
>>>>>buggy and not even deadlock-free parallel search is unsound and, therefore,
>>>>>lets you deduce and measure anything ... (see above)
>>>>>
>>>>>=Ernst=
>>>>
>>>>
>>>>Actually, super-linear speedups happen all the time, in selected problems.
>>>>Bruce and I have been comparing notes on one particular set of positions, and
>>>>for me, one of the positions runs 10x faster with 4 cpus.  And a couple of
>>>>others run 5-6 times faster than with 1 cpu.  But, of course, on average,
>>>>the overall speedup is always <= 4 (3.2 in the case of current crafty, while
>>>>Cray Blitz was closer to 3.7).
>>>>
>>>>This is generally explained by poor move ordering, where in parallel you search
>>>>the best branch before you finish searching non-best branches...
>>>>
>>>>But you are really right...  until a program runs reliably (and this takes quite
>>>>a while to reach) measuring performance is not very useful, because the
>>>>performance of a bug-ridden program can be _very_ misleading.  As I have worked
>>>>on the parallel search in Crafty I have run into this many times...
>>>>
>>>>Consistent speedups > 4 imply a very inefficient sequential search, move-
>>>>ordering wise.
>>>
>>>2 things
>>>   a) when searching at bigger depths my flip rate becomes less and less
>>>      and gets back to less than a percent, because of hashtable storage,
>>>      where in crafty this is 5%
>>>   b) i am not using recursion so that already gives me a lot of extra speedup
>>>      as i hardly split at leafs in contradiction to crafty
>>>   c) my move ordering indeed is not that well, which is mainly caused by the
>>>      huge evaluation function of my program
>>>   d) i get the same mainlines and rootscores when not using alfabeta dependant
>>>      extensions.
>>>   e) for cray blitz you reported a 3.9 speedup
>>
>>we aren't comparing apples to apples.  The 3.9 was in a _game_.  Not over
>>a set of random test positions.  In this game, the hash was carried from move
>>to move unchanged, killer moves were carried along, the PV was carried along.
>>Etc..  Much easier to get a good speedup in a game, than in random unrelated
>>positions where each new position has _no_ move ordering help at all...
>>
>>
>>
>>>
>>>So i have no reason to not assume that my speedup is based upon bugs which
>>>have the nice feature that they cause a form of pruning where the rest of
>>>the world can only dream of.
>>>
>>>The number of tests where i base my speedup
>>>on is comparable with number of tests done for writings in the iccaj.
>>>
>>>The big question in this is: how deadly is the fact that my
>>>move ordering is 80% for the first move, and that in crafty this is 90%.
>>>
>>>I DON'T SPLIT IN CUTNODES. I ONLY SPLIT IN <= ALFA NODES.
>>
>>No way you can guarantee this.  Without _perfect_ move ordering.  IE I couldn't
>>in Cray Blitz, I can't in Crafty.  Because until a branch is search, there is
>>no way to be _sure_ you search the best move first...
>>
>>
>>
>>>
>>>And in <= alfa nodes i'm doing very good: less than 1% chance that it flips,
>>>where in crafty this chance is 5%.
>>>
>>>Together with the fact that in DIEP i suffer less from locking as all
>>>my systemtime goes to the evaluation anyway, i think i'm allowed to conclude
>>>that my speedup is 4.0
>>
>>I don't have _any_ time lost in locks.  At least on my quad xeon and on the
>>16 processor Cray.  On the 32 processor machine we saw a bit of inefficiency
>>with locks, but that was cleared up by making the locks finer-grained.
>>
>>
>>>
>>>I've recently rewritten the automatic testing for DIEP.
>>>If you provide me some positions i don't mind producing 2 outputs:
>>>
>>>a PRO200 searching single cpu. a pro200 searching 4 cpu's.
>>>
>>>Now we can compare results. We also can compare how many nodes
>>>DIEP needs to get to a certain depth and how many crafty
>>>needs to get to a certain depth. So we directly see how well my
>>>move ordering is then.
>>>
>>>Greetings,
>>>Vincent
>>
>>
>>I don't see how we can compare nodes when our searches are different.  IE
>>I don't do _any_ forward pruning at all except for that which is a result of
>>null-move pruning.  I don't do checks in the q-search.  So we can compare,
>>but we are comparing apples to oranges...
>>If you want a test suite, I can give you the one Bruce and I have been playing
>>with...
>>I am currently getting 3.2X on it...
>
>But anyway if i would get 4.0x on it you would still argue that our
>programs are incomparable, although in ICCA you compare your speedup also
>with other programs, which are way more or way less
>selective than Cray Blitz was!
>
>Greetings,
>Vincent


again you mix apples and oranges.  We can compare _speedup_ all you want.  Bruce
and I do this regularly, even though our searches are different.  We do _not_
compare nodes, because our trees are shaped differently.

If you get 4.0, that would be amazing.  And it _must_ be calculated as:

       time to do N-ply search with 4 processors
       -----------------------------------------
       time to do N-ply search with 1 processor

If you get 4 there, it means that you made _no_ bad decisions on splitting
the tree, something that seems impossible in practice.  Yes you can get
results > 4 on occasion, if you have poor move ordering.  The better your
mover ordering, the less often that will happen however.   Of course the
better your move ordering the longer it takes you to solve things like fine
#70 too...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.