Author: Robert Hyatt
Date: 12:18:26 11/12/03
Go up one level in this thread
On November 12, 2003 at 14:50:32, Russell Reagan wrote: >I am not sure I understand how NUMA works compared to SMP. I have gotten the >impression from previous discussions of NUMA that it is inferior to SMP, but >from what I've been able to find on the net, it sounds like they can be >complimentary to one another, so I'm confused. > >Are they exclusively different or is NUMA an addition to SMP? NUMA == Non-Uniform Memory Architecture SMP == Symmetric MultiProcessing. In a SMP box, all processors are connected to memory using the same datapath. To help 4-way boxes avoid memory bottlenecks, most 4-way boxes use 4-way memory interleaving, spreading consecutive 8-byte chunks of memory across consecutive banks so that the reads can be done in parallel. IE a good 4-way box has 4x the potential memory bandwidth of a 1-way box, assuming Intel or AMD prior to the Opteron. In a NUMA box, each processor has local memory, but each can see all of the other memory via routers. The problem comes to light when all four processors want to access memory that is local to one CPU. The requests become serialized and that kills performance. The issue is to prevent this from happening. That's what I had to do for the Compaq NUMA-alpha box last year. That's what Eugene and I re-did for Crafty version 19.5 to make it work reasonably on the Opteron NUMA boxes. Opteron potentially has a higher memory bandwidth. But, as always, potential and practical don't mix. When all processors try to beat on the memory attached to the same physical processor, it produces a huge hot-spot with a very bad delay. Cache coherency has the same problems, as when I have a line of memory in one cache controller, and that gets modified in another cache controller, then we have a _lot_ of cache-controller to cache-controller "noise" to handle that. On a 4-way box, CPU A can quickly address its own local memory. It can not-so-quickly address memory on the two CPUs it is directly connected to. It can even-slower address memory on the last processor as it has to go through one of the two it is directly connected with first, which adds an extra hop. The goal is to have everything a single processor needs reside in its local memory. Then you avoid the NUMA overhead for most accesses and it runs like a bat out of Egypt. > >In order for a process to take advantage of SMP, it must split the work into two >threads. What changes must be made in order for a program to be NUMA aware? See above. Each processor needs important stuff kept in local memory rather than in memory that is one (or more) hops away to another CPU.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.