Author: Vincent Diepeveen
Date: 07:38:17 06/29/99
Go up one level in this thread
On June 29, 1999 at 10:19:32, Vincent Diepeveen wrote: >On June 29, 1999 at 09:56:48, Robert Hyatt wrote: > >>On June 29, 1999 at 09:07:40, Ernst A. Heinz wrote: >> >>>On June 28, 1999 at 18:19:59, Vincent Diepeveen wrote: >>> >>>> [...] and that's kind of tough >>>>in the parallellism i use (which is an improved cray blitz version). >>> >>>Vincent, >>> >>>Your above statement is most amusing. Nobody else would ever call his >>>buggy and not even deadlock-free parallel search an "improvement". >>>Why don't you first establish the soundness of your implementation >>>before measuring its "phenomenal" effectiveness ... >>> >>>You surely know that anything is deducible from unsound hypotheses. >>> >>>> [...] >>>> >>>>The speedup of the 4x400Mhz xeon with 450Mb hash compared to >>>>a PII-450 NT is about 5 times for DIEP, as the speedup of 4 processors >>>>for DIEP is 4.06 . The more difficult a position is the >>>>bigger the speedup. Speedups over 10 times at 4 processors are not an >>>>exception for DIEP. Now this is theoretical not possible will some say. >>>>Dead wrong. It is. >>> >>>So-called "superlinear speed-up" due to cache and memory effects is indeed >>>possible but highly unlikely in the case of parallelizing a *sophisticated* >>>alpha-beta searcher. However, "Diep" may just not be so super-sophisticated >>>as you constantly claim. Do you also get such extreme improvements for your >>>sequential program version when adding processor cache and main memory on >>>a single-CPU system? >>> >>>The most likely explanation for your speed-up observations is that your >>>buggy and not even deadlock-free parallel search is unsound and, therefore, >>>lets you deduce and measure anything ... (see above) >>> >>>=Ernst= >> >> >>Actually, super-linear speedups happen all the time, in selected problems. >>Bruce and I have been comparing notes on one particular set of positions, and >>for me, one of the positions runs 10x faster with 4 cpus. And a couple of >>others run 5-6 times faster than with 1 cpu. But, of course, on average, >>the overall speedup is always <= 4 (3.2 in the case of current crafty, while >>Cray Blitz was closer to 3.7). >> >>This is generally explained by poor move ordering, where in parallel you search >>the best branch before you finish searching non-best branches... >> >>But you are really right... until a program runs reliably (and this takes quite >>a while to reach) measuring performance is not very useful, because the >>performance of a bug-ridden program can be _very_ misleading. As I have worked >>on the parallel search in Crafty I have run into this many times... >> >>Consistent speedups > 4 imply a very inefficient sequential search, move- >>ordering wise. > >2 things > a) when searching at bigger depths my flip rate becomes less and less > and gets back to less than a percent, because of hashtable storage, > where in crafty this is 5% > b) i am not using recursion so that already gives me a lot of extra speedup > as i hardly split at leafs in contradiction to crafty > c) my move ordering indeed is not that well, which is mainly caused by the > huge evaluation function of my program > d) i get the same mainlines and rootscores when not using alfabeta dependant > extensions. > e) for cray blitz you reported a 3.9 speedup apology in icca 1997-1 you claim at page 16 at 4 processors a 3.7 speedup and not 3.9 I'll run the same positions over DIEP this afternoon and tomorrow i'll publice the result. I see you tested from move 9 for black. i assume that the 24th test position was move 32 for black. Greetings, Vincent >So i have no reason to not assume that my speedup is based upon bugs which >have the nice feature that they cause a form of pruning where the rest of >the world can only dream of. > >The number of tests where i base my speedup >on is comparable with number of tests done for writings in the iccaj. > >The big question in this is: how deadly is the fact that my >move ordering is 80% for the first move, and that in crafty this is 90%. > >I DON'T SPLIT IN CUTNODES. I ONLY SPLIT IN <= ALFA NODES. > >And in <= alfa nodes i'm doing very good: less than 1% chance that it flips, >where in crafty this chance is 5%. > >Together with the fact that in DIEP i suffer less from locking as all >my systemtime goes to the evaluation anyway, i think i'm allowed to conclude >that my speedup is 4.0 > >I've recently rewritten the automatic testing for DIEP. >If you provide me some positions i don't mind producing 2 outputs: > >a PRO200 searching single cpu. a pro200 searching 4 cpu's. > >Now we can compare results. We also can compare how many nodes >DIEP needs to get to a certain depth and how many crafty >needs to get to a certain depth. So we directly see how well my >move ordering is then. > >Greetings, >Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.