Author: Robert Hyatt
Date: 05:46:12 08/29/03
Go up one level in this thread
On August 28, 2003 at 19:12:52, Jeremiah Penery wrote: >On August 28, 2003 at 11:39:51, Robert Hyatt wrote: > >>On August 28, 2003 at 01:00:22, Jeremiah Penery wrote: >> >>>On a 2-way Opteron, accessing non-local memory should be at least as fast as >>>accessing memory on a single-cpu P4 or Athlon system. For a 4-way Opteron, it >>>still should not be worse, even if it requires 2 hops. >> >>Perhaps. But don't forget, when you have two cpus, 1/2 the memory _is_ slower >>than the other half. By some fixed latency. A poor algorithm will definitely >>perform slower than a good one, because the good one won't fight that extra >>latency while the poor one will hit it all the time. > >But it's not more latency than you get *best case* when using a traditional SMP >setup. So you can only gain, even with a "poor algorithm". If you compare an SMP xeon to a dual 486 you _also_ "win". But my point was that with a NUMA architecture, you might win a lot less than you could, if the algorithm doesn't take into account the specific architectural issues with a NUMA machine. > >>>Cache coherency is just as much a problem on SMP machines as on NUMA ones. >> >>no it isn't. For the reasons NUMA memory access is more problematic than >>pure SMP access. The cache controllers have the _same_ latency issue. A >>cache controller "way over there" takes much longer to "snoop/invalidate" >>than one "right next door." >> >>So you run into the _same_ issue again. The "farther apart" two processors >>are, the less stuff you want to share in memory, because the cache coherency >>problem is slower to handle... > >I don't know that I understand what you're saying, but I also don't think you >understand the Opteron NUMA setup very well. In a 2(4) CPU Opteron setup, every >CPU (and memory bank) is *closer* to (or not farther from, in the worst case) >each other CPU (or memory) than in a traditional SMP setup with a northbridge. >Opterons are connected *directly*, rather than through a traditional bus. >Latency between processors is not always uniform, but it is still faster than >the traditional setup. I don't see why uniformity is an issue, because even >memory accesses in a single-CPU setup are far from uniform in latency. The >point is that the latency is lower than normal, regardless of the fact that it >is non-uniform. My point was, again, that you want most references from a CPU to go to its local memory for max performance. It's an issue on _all_ NUMA-type machines.
This page took 0.05 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.