Author: Robert Hyatt
Date: 14:17:07 11/24/98
Go up one level in this thread
On November 24, 1998 at 09:10:01, Ernst A. Heinz wrote: >On November 24, 1998 at 08:29:12, Robert Hyatt wrote: >> >> [...] >> >>2. In results obtained by me (crafty and Cray Blitz), Cilkchess, and others, >>the typical overhead is 30% per processor. IE Crafty is about 1.7X faster on >>two processors, about 2.4X faster on 3, about 3.1X faster on 4, and so forth, >>thru 8 from personal testing, and thru 16 taking someone else's data. > >Bob, > >AFAIK your 30% overhead is only a good average approximation for lowly parallel >searchers on SMPs with *physically* shared hash tables. For massively parallel >searchers on machines with *physically* distributed memory I have not yet seen >any experimental data that *conclusively* supports such high parallel >efficiency. To the contrary, the only frank publications in this respect seem >to be the articles by the "StarTech" and "StarSocrates" groups who admit to >something like an application speedup of only 50-60 on a CM-5 with 512 CPUs >which translates to a parallel efficiency of 10%-15% for their Jamboree search. >Most other researchers who reported higher relative speedups for their >massively parallel implementations on distributed-memory machines either failed >to account for the increases in hash-table sizes or used horribly inefficient >sequential implementations as their point of reference. > >=Ernst= This isn't really an issue about 'shared hash tables'. Hash tables don't give a factor of 2 in the middlegame based on results I've gotten. This is about "process granularity". The *Socrates machine uses a message-passing protocol that is inherently slow. I've run on such machines (IE the CM-5 for one, the T3D/E for another) and this causes serious problems. The big + for shared memory is instant communication. so that threads can share information without regard to "cost". The DB machine doesn't suffer from the huge CM-5 type cost, they only use 16 (or 32) cpus, and each CPU talks to the chess processors at bus speeds, not at 2microseconds/message or whatever as in the CM and other architectures. In fact, the DB (last edition) chess processors didn't transposition tables, only the search done on the SP did. And as I had mentioned earlier, they have already factored in a huge "loss" in NPS before arriving at their 250M value. The hardware they used should max out at around 1B nodes per second. Hsu used to report running each chess processor at about 70% of max due to timing issues on who does what part of the search to what depth. I *really* doubt they are only searching an effective 20M nodes per second. Because I played them when they had 6 cpus in deep thought and it felt far stronger than Cray Blitz, which was only doing 500K-1M at the time... But I'll see what I can find out precisely about the "box"...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.