Author: Vincent Diepeveen
Date: 07:45:40 09/03/03
Go up one level in this thread
On September 03, 2003 at 00:07:37, Jeremiah Penery wrote: >On September 02, 2003 at 22:54:49, Robert Hyatt wrote: > >>Maybe. But I use threads. And on NUMA threads are _bad_. One example, >>do you _really_ want to _share_ all the attack bitmap stuff? That means it >>is in one processor's local memory, but will be slow for all others. What >>about the instructions? Same thing. > >After some thinking, it seems to me that the *average* memory access speed will >be the same no matter where the data is placed, for anything intended to be If n cpu's can access local memory at 280 ns (R14000) and accessing remote memory is 6-7 us, then what is faster? >shared between all processors (in a small NUMA configuration). The reason for >this is because what is local to one processor will be non-local to all others. Unless each processor has its own local copy. Just for the record. Shipping a megabyte from 1 cpu to another cpu and then locally accessing it, is way faster than remotely accessing it @ random. Thanks, Vincent >It doesn't matter if everything is local to the same processor or spread around, >because the same percentage of total accesses will be non-local in any case >(unless there is a disparity between the number of accesses each CPU is trying >to accomplish). > >The only problem is that one processor's memory banks might get hammered, but >that _is_ the same with an (similarly small) SMP configuration - all accesses go >serially through one memory controller. memory banks you can access using the HUB in parallel. Reads cost nothing, just the calling processor will be a bit waiting (like 6-7 us worst case, or like 10-30 us for bob's new 4 node thing). >As machine size increases, of course, NUMA can run into more problems. But then >SMP has its own problems as well (cost and complexity of memory sub-system, >mostly). Not if you directly connect each node to each node , which is what Cray does. That keeps latency very fast, but it's $$$$. So cc-NUMA is always slower for remote latency than Cray trivially. That you're still denying this is madness. We're talking about a factor 20 here.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.