Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: DIEP parallel in Paderborn - technical and detailed story

Author: Vincent Diepeveen

Date: 07:19:32 06/29/99

Go up one level in this thread


On June 29, 1999 at 09:56:48, Robert Hyatt wrote:

>On June 29, 1999 at 09:07:40, Ernst A. Heinz wrote:
>
>>On June 28, 1999 at 18:19:59, Vincent Diepeveen wrote:
>>
>>> [...] and that's kind of tough
>>>in the parallellism i use (which is an improved cray blitz version).
>>
>>Vincent,
>>
>>Your above statement is most amusing. Nobody else would ever call his
>>buggy and not even deadlock-free parallel search an "improvement".
>>Why don't you first establish the soundness of your implementation
>>before measuring its "phenomenal" effectiveness ...
>>
>>You surely know that anything is deducible from unsound hypotheses.
>>
>>> [...]
>>>
>>>The speedup of the 4x400Mhz xeon with 450Mb hash compared to
>>>a PII-450 NT is about 5 times for DIEP, as the speedup of 4 processors
>>>for DIEP is 4.06 . The more difficult a position is the
>>>bigger the speedup. Speedups over 10 times at 4 processors are not an
>>>exception for DIEP. Now this is theoretical not possible will some say.
>>>Dead wrong. It is.
>>
>>So-called "superlinear speed-up" due to cache and memory effects is indeed
>>possible but highly unlikely in the case of parallelizing a *sophisticated*
>>alpha-beta searcher. However, "Diep" may just not be so super-sophisticated
>>as you constantly claim. Do you also get such extreme improvements for your
>>sequential program version when adding processor cache and main memory on
>>a single-CPU system?
>>
>>The most likely explanation for your speed-up observations is that your
>>buggy and not even deadlock-free parallel search is unsound and, therefore,
>>lets you deduce and measure anything ... (see above)
>>
>>=Ernst=
>
>
>Actually, super-linear speedups happen all the time, in selected problems.
>Bruce and I have been comparing notes on one particular set of positions, and
>for me, one of the positions runs 10x faster with 4 cpus.  And a couple of
>others run 5-6 times faster than with 1 cpu.  But, of course, on average,
>the overall speedup is always <= 4 (3.2 in the case of current crafty, while
>Cray Blitz was closer to 3.7).
>
>This is generally explained by poor move ordering, where in parallel you search
>the best branch before you finish searching non-best branches...
>
>But you are really right...  until a program runs reliably (and this takes quite
>a while to reach) measuring performance is not very useful, because the
>performance of a bug-ridden program can be _very_ misleading.  As I have worked
>on the parallel search in Crafty I have run into this many times...
>
>Consistent speedups > 4 imply a very inefficient sequential search, move-
>ordering wise.

2 things
   a) when searching at bigger depths my flip rate becomes less and less
      and gets back to less than a percent, because of hashtable storage,
      where in crafty this is 5%
   b) i am not using recursion so that already gives me a lot of extra speedup
      as i hardly split at leafs in contradiction to crafty
   c) my move ordering indeed is not that well, which is mainly caused by the
      huge evaluation function of my program
   d) i get the same mainlines and rootscores when not using alfabeta dependant
      extensions.
   e) for cray blitz you reported a 3.9 speedup

So i have no reason to not assume that my speedup is based upon bugs which
have the nice feature that they cause a form of pruning where the rest of
the world can only dream of.

The number of tests where i base my speedup
on is comparable with number of tests done for writings in the iccaj.

The big question in this is: how deadly is the fact that my
move ordering is 80% for the first move, and that in crafty this is 90%.

I DON'T SPLIT IN CUTNODES. I ONLY SPLIT IN <= ALFA NODES.

And in <= alfa nodes i'm doing very good: less than 1% chance that it flips,
where in crafty this chance is 5%.

Together with the fact that in DIEP i suffer less from locking as all
my systemtime goes to the evaluation anyway, i think i'm allowed to conclude
that my speedup is 4.0

I've recently rewritten the automatic testing for DIEP.
If you provide me some positions i don't mind producing 2 outputs:

a PRO200 searching single cpu. a pro200 searching 4 cpu's.

Now we can compare results. We also can compare how many nodes
DIEP needs to get to a certain depth and how many crafty
needs to get to a certain depth. So we directly see how well my
move ordering is then.

Greetings,
Vincent






This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.