Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: DIEP parallel in Paderborn - technical and detailed story

Author: Vincent Diepeveen
Date: 07:38:17 06/29/99
On June 29, 1999 at 10:19:32, Vincent Diepeveen wrote:

>On June 29, 1999 at 09:56:48, Robert Hyatt wrote:
>
>>On June 29, 1999 at 09:07:40, Ernst A. Heinz wrote:
>>
>>>On June 28, 1999 at 18:19:59, Vincent Diepeveen wrote:
>>>
>>>> [...] and that's kind of tough
>>>>in the parallellism i use (which is an improved cray blitz version).
>>>
>>>Vincent,
>>>
>>>Your above statement is most amusing. Nobody else would ever call his
>>>buggy and not even deadlock-free parallel search an "improvement".
>>>Why don't you first establish the soundness of your implementation
>>>before measuring its "phenomenal" effectiveness ...
>>>
>>>You surely know that anything is deducible from unsound hypotheses.
>>>
>>>> [...]
>>>>
>>>>The speedup of the 4x400Mhz xeon with 450Mb hash compared to
>>>>a PII-450 NT is about 5 times for DIEP, as the speedup of 4 processors
>>>>for DIEP is 4.06 . The more difficult a position is the
>>>>bigger the speedup. Speedups over 10 times at 4 processors are not an
>>>>exception for DIEP. Now this is theoretical not possible will some say.
>>>>Dead wrong. It is.
>>>
>>>So-called "superlinear speed-up" due to cache and memory effects is indeed
>>>possible but highly unlikely in the case of parallelizing a *sophisticated*
>>>alpha-beta searcher. However, "Diep" may just not be so super-sophisticated
>>>as you constantly claim. Do you also get such extreme improvements for your
>>>sequential program version when adding processor cache and main memory on
>>>a single-CPU system?
>>>
>>>The most likely explanation for your speed-up observations is that your
>>>buggy and not even deadlock-free parallel search is unsound and, therefore,
>>>lets you deduce and measure anything ... (see above)
>>>
>>>=Ernst=
>>
>>
>>Actually, super-linear speedups happen all the time, in selected problems.
>>Bruce and I have been comparing notes on one particular set of positions, and
>>for me, one of the positions runs 10x faster with 4 cpus.  And a couple of
>>others run 5-6 times faster than with 1 cpu.  But, of course, on average,
>>the overall speedup is always <= 4 (3.2 in the case of current crafty, while
>>Cray Blitz was closer to 3.7).
>>
>>This is generally explained by poor move ordering, where in parallel you search
>>the best branch before you finish searching non-best branches...
>>
>>But you are really right...  until a program runs reliably (and this takes quite
>>a while to reach) measuring performance is not very useful, because the
>>performance of a bug-ridden program can be _very_ misleading.  As I have worked
>>on the parallel search in Crafty I have run into this many times...
>>
>>Consistent speedups > 4 imply a very inefficient sequential search, move-
>>ordering wise.
>
>2 things
>   a) when searching at bigger depths my flip rate becomes less and less
>      and gets back to less than a percent, because of hashtable storage,
>      where in crafty this is 5%
>   b) i am not using recursion so that already gives me a lot of extra speedup
>      as i hardly split at leafs in contradiction to crafty
>   c) my move ordering indeed is not that well, which is mainly caused by the
>      huge evaluation function of my program
>   d) i get the same mainlines and rootscores when not using alfabeta dependant
>      extensions.
>   e) for cray blitz you reported a 3.9 speedup

apology in icca 1997-1 you claim at
page 16 at 4 processors a 3.7 speedup
and not 3.9

I'll run the same positions over DIEP this afternoon and tomorrow
i'll publice the result.

I see you tested from move 9 for black.
i assume that the 24th test position was move 32 for black.

Greetings,
Vincent


>So i have no reason to not assume that my speedup is based upon bugs which
>have the nice feature that they cause a form of pruning where the rest of
>the world can only dream of.
>
>The number of tests where i base my speedup
>on is comparable with number of tests done for writings in the iccaj.
>
>The big question in this is: how deadly is the fact that my
>move ordering is 80% for the first move, and that in crafty this is 90%.
>
>I DON'T SPLIT IN CUTNODES. I ONLY SPLIT IN <= ALFA NODES.
>
>And in <= alfa nodes i'm doing very good: less than 1% chance that it flips,
>where in crafty this chance is 5%.
>
>Together with the fact that in DIEP i suffer less from locking as all
>my systemtime goes to the evaluation anyway, i think i'm allowed to conclude
>that my speedup is 4.0
>
>I've recently rewritten the automatic testing for DIEP.
>If you provide me some positions i don't mind producing 2 outputs:
>
>a PRO200 searching single cpu. a pro200 searching 4 cpu's.
>
>Now we can compare results. We also can compare how many nodes
>DIEP needs to get to a certain depth and how many crafty
>needs to get to a certain depth. So we directly see how well my
>move ordering is then.
>
>Greetings,
>Vincent
Re: DIEP parallel in Paderborn - technical and detailed story Robert Hyatt 11:11:55 06/29/99
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.