Author: Robert Hyatt
Date: 11:11:55 06/29/99
Go up one level in this thread
On June 29, 1999 at 10:38:17, Vincent Diepeveen wrote: >On June 29, 1999 at 10:19:32, Vincent Diepeveen wrote: > >>On June 29, 1999 at 09:56:48, Robert Hyatt wrote: >> >>>On June 29, 1999 at 09:07:40, Ernst A. Heinz wrote: >>> >>>>On June 28, 1999 at 18:19:59, Vincent Diepeveen wrote: >>>> >>>>> [...] and that's kind of tough >>>>>in the parallellism i use (which is an improved cray blitz version). >>>> >>>>Vincent, >>>> >>>>Your above statement is most amusing. Nobody else would ever call his >>>>buggy and not even deadlock-free parallel search an "improvement". >>>>Why don't you first establish the soundness of your implementation >>>>before measuring its "phenomenal" effectiveness ... >>>> >>>>You surely know that anything is deducible from unsound hypotheses. >>>> >>>>> [...] >>>>> >>>>>The speedup of the 4x400Mhz xeon with 450Mb hash compared to >>>>>a PII-450 NT is about 5 times for DIEP, as the speedup of 4 processors >>>>>for DIEP is 4.06 . The more difficult a position is the >>>>>bigger the speedup. Speedups over 10 times at 4 processors are not an >>>>>exception for DIEP. Now this is theoretical not possible will some say. >>>>>Dead wrong. It is. >>>> >>>>So-called "superlinear speed-up" due to cache and memory effects is indeed >>>>possible but highly unlikely in the case of parallelizing a *sophisticated* >>>>alpha-beta searcher. However, "Diep" may just not be so super-sophisticated >>>>as you constantly claim. Do you also get such extreme improvements for your >>>>sequential program version when adding processor cache and main memory on >>>>a single-CPU system? >>>> >>>>The most likely explanation for your speed-up observations is that your >>>>buggy and not even deadlock-free parallel search is unsound and, therefore, >>>>lets you deduce and measure anything ... (see above) >>>> >>>>=Ernst= >>> >>> >>>Actually, super-linear speedups happen all the time, in selected problems. >>>Bruce and I have been comparing notes on one particular set of positions, and >>>for me, one of the positions runs 10x faster with 4 cpus. And a couple of >>>others run 5-6 times faster than with 1 cpu. But, of course, on average, >>>the overall speedup is always <= 4 (3.2 in the case of current crafty, while >>>Cray Blitz was closer to 3.7). >>> >>>This is generally explained by poor move ordering, where in parallel you search >>>the best branch before you finish searching non-best branches... >>> >>>But you are really right... until a program runs reliably (and this takes quite >>>a while to reach) measuring performance is not very useful, because the >>>performance of a bug-ridden program can be _very_ misleading. As I have worked >>>on the parallel search in Crafty I have run into this many times... >>> >>>Consistent speedups > 4 imply a very inefficient sequential search, move- >>>ordering wise. >> >>2 things >> a) when searching at bigger depths my flip rate becomes less and less >> and gets back to less than a percent, because of hashtable storage, >> where in crafty this is 5% >> b) i am not using recursion so that already gives me a lot of extra speedup >> as i hardly split at leafs in contradiction to crafty >> c) my move ordering indeed is not that well, which is mainly caused by the >> huge evaluation function of my program >> d) i get the same mainlines and rootscores when not using alfabeta dependant >> extensions. >> e) for cray blitz you reported a 3.9 speedup > >apology in icca 1997-1 you claim at >page 16 at 4 processors a 3.7 speedup >and not 3.9 > >I'll run the same positions over DIEP this afternoon and tomorrow >i'll publice the result. > >I see you tested from move 9 for black. >i assume that the 24th test position was move 32 for black. > >Greetings, >Vincent > correct... but note that I carried _everything_ from move to move, hash table, killer moves, etc... because this was a complete series of moves from a real game... I think this gives better data for how it does in a game than does testing random unrelated positions... > >>So i have no reason to not assume that my speedup is based upon bugs which >>have the nice feature that they cause a form of pruning where the rest of >>the world can only dream of. >> >>The number of tests where i base my speedup >>on is comparable with number of tests done for writings in the iccaj. >> >>The big question in this is: how deadly is the fact that my >>move ordering is 80% for the first move, and that in crafty this is 90%. >> >>I DON'T SPLIT IN CUTNODES. I ONLY SPLIT IN <= ALFA NODES. >> >>And in <= alfa nodes i'm doing very good: less than 1% chance that it flips, >>where in crafty this chance is 5%. >> >>Together with the fact that in DIEP i suffer less from locking as all >>my systemtime goes to the evaluation anyway, i think i'm allowed to conclude >>that my speedup is 4.0 >> >>I've recently rewritten the automatic testing for DIEP. >>If you provide me some positions i don't mind producing 2 outputs: >> >>a PRO200 searching single cpu. a pro200 searching 4 cpu's. >> >>Now we can compare results. We also can compare how many nodes >>DIEP needs to get to a certain depth and how many crafty >>needs to get to a certain depth. So we directly see how well my >>move ordering is then. >> >>Greetings, >>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.