Author: Eugene Nalimov
Date: 15:40:51 08/29/03
Go up one level in this thread
On August 29, 2003 at 18:32:46, Jeremiah Penery wrote: >On August 29, 2003 at 08:46:12, Robert Hyatt wrote: > >>On August 28, 2003 at 19:12:52, Jeremiah Penery wrote: >> >>>But it's not more latency than you get *best case* when using a traditional SMP >>>setup. So you can only gain, even with a "poor algorithm". >> >>If you compare an SMP xeon to a dual 486 you _also_ "win". > >And what is that supposed to demonstrate? > >>But my point was that with a NUMA architecture, you might win a lot less >>than you could, if the algorithm doesn't take into account the specific >>architectural issues with a NUMA machine. >> >>My point was, again, that you want most references from a CPU to go to its >>local memory for max performance. It's an issue on _all_ NUMA-type machines. > >Of course I know that. My point is that with Opteron, even if you are accessing >non-local memory *always*, you are not accessing it slower than you would with, >say, a traditional SMP machine (2x Xeon, for instance). >Of course you can do a lot better - all I'm saying is that there's no way you're >going to be doing worse. > >Either way you win, even with a crappy NUMA algorithm. I am not so sure. With some NUMA implementations each memory bank has limited bandwith, so if you happened to allocate all the critical data in one node's memory you'll overload its memory controller. I had seen a case where SMP application was blindly ported to a 32-CPUs NUMA system (8 nodes, 4 64-bit CPUs per node, 256Gb RAM total). Application run much slower on 32 CPUs than on single CPU. Thanks, Eugene
This page took 0.05 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.