Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: DIEP parallel in Paderborn - technical and detailed story

Author: Robert Hyatt

Date: 11:11:55 06/29/99

Go up one level in this thread


On June 29, 1999 at 10:38:17, Vincent Diepeveen wrote:

>On June 29, 1999 at 10:19:32, Vincent Diepeveen wrote:
>
>>On June 29, 1999 at 09:56:48, Robert Hyatt wrote:
>>
>>>On June 29, 1999 at 09:07:40, Ernst A. Heinz wrote:
>>>
>>>>On June 28, 1999 at 18:19:59, Vincent Diepeveen wrote:
>>>>
>>>>> [...] and that's kind of tough
>>>>>in the parallellism i use (which is an improved cray blitz version).
>>>>
>>>>Vincent,
>>>>
>>>>Your above statement is most amusing. Nobody else would ever call his
>>>>buggy and not even deadlock-free parallel search an "improvement".
>>>>Why don't you first establish the soundness of your implementation
>>>>before measuring its "phenomenal" effectiveness ...
>>>>
>>>>You surely know that anything is deducible from unsound hypotheses.
>>>>
>>>>> [...]
>>>>>
>>>>>The speedup of the 4x400Mhz xeon with 450Mb hash compared to
>>>>>a PII-450 NT is about 5 times for DIEP, as the speedup of 4 processors
>>>>>for DIEP is 4.06 . The more difficult a position is the
>>>>>bigger the speedup. Speedups over 10 times at 4 processors are not an
>>>>>exception for DIEP. Now this is theoretical not possible will some say.
>>>>>Dead wrong. It is.
>>>>
>>>>So-called "superlinear speed-up" due to cache and memory effects is indeed
>>>>possible but highly unlikely in the case of parallelizing a *sophisticated*
>>>>alpha-beta searcher. However, "Diep" may just not be so super-sophisticated
>>>>as you constantly claim. Do you also get such extreme improvements for your
>>>>sequential program version when adding processor cache and main memory on
>>>>a single-CPU system?
>>>>
>>>>The most likely explanation for your speed-up observations is that your
>>>>buggy and not even deadlock-free parallel search is unsound and, therefore,
>>>>lets you deduce and measure anything ... (see above)
>>>>
>>>>=Ernst=
>>>
>>>
>>>Actually, super-linear speedups happen all the time, in selected problems.
>>>Bruce and I have been comparing notes on one particular set of positions, and
>>>for me, one of the positions runs 10x faster with 4 cpus.  And a couple of
>>>others run 5-6 times faster than with 1 cpu.  But, of course, on average,
>>>the overall speedup is always <= 4 (3.2 in the case of current crafty, while
>>>Cray Blitz was closer to 3.7).
>>>
>>>This is generally explained by poor move ordering, where in parallel you search
>>>the best branch before you finish searching non-best branches...
>>>
>>>But you are really right...  until a program runs reliably (and this takes quite
>>>a while to reach) measuring performance is not very useful, because the
>>>performance of a bug-ridden program can be _very_ misleading.  As I have worked
>>>on the parallel search in Crafty I have run into this many times...
>>>
>>>Consistent speedups > 4 imply a very inefficient sequential search, move-
>>>ordering wise.
>>
>>2 things
>>   a) when searching at bigger depths my flip rate becomes less and less
>>      and gets back to less than a percent, because of hashtable storage,
>>      where in crafty this is 5%
>>   b) i am not using recursion so that already gives me a lot of extra speedup
>>      as i hardly split at leafs in contradiction to crafty
>>   c) my move ordering indeed is not that well, which is mainly caused by the
>>      huge evaluation function of my program
>>   d) i get the same mainlines and rootscores when not using alfabeta dependant
>>      extensions.
>>   e) for cray blitz you reported a 3.9 speedup
>
>apology in icca 1997-1 you claim at
>page 16 at 4 processors a 3.7 speedup
>and not 3.9
>
>I'll run the same positions over DIEP this afternoon and tomorrow
>i'll publice the result.
>
>I see you tested from move 9 for black.
>i assume that the 24th test position was move 32 for black.
>
>Greetings,
>Vincent
>

correct... but note that I carried _everything_ from move to move, hash table,
killer moves, etc...  because this was a complete series of moves from a real
game...  I think this gives better data for how it does in a game than does
testing random unrelated positions...




>
>>So i have no reason to not assume that my speedup is based upon bugs which
>>have the nice feature that they cause a form of pruning where the rest of
>>the world can only dream of.
>>
>>The number of tests where i base my speedup
>>on is comparable with number of tests done for writings in the iccaj.
>>
>>The big question in this is: how deadly is the fact that my
>>move ordering is 80% for the first move, and that in crafty this is 90%.
>>
>>I DON'T SPLIT IN CUTNODES. I ONLY SPLIT IN <= ALFA NODES.
>>
>>And in <= alfa nodes i'm doing very good: less than 1% chance that it flips,
>>where in crafty this chance is 5%.
>>
>>Together with the fact that in DIEP i suffer less from locking as all
>>my systemtime goes to the evaluation anyway, i think i'm allowed to conclude
>>that my speedup is 4.0
>>
>>I've recently rewritten the automatic testing for DIEP.
>>If you provide me some positions i don't mind producing 2 outputs:
>>
>>a PRO200 searching single cpu. a pro200 searching 4 cpu's.
>>
>>Now we can compare results. We also can compare how many nodes
>>DIEP needs to get to a certain depth and how many crafty
>>needs to get to a certain depth. So we directly see how well my
>>move ordering is then.
>>
>>Greetings,
>>Vincent



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.