Author: Eugene Nalimov
Date: 19:19:59 08/29/03
Go up one level in this thread
On August 29, 2003 at 19:56:34, Jeremiah Penery wrote: >On August 29, 2003 at 18:40:51, Eugene Nalimov wrote: > >>On August 29, 2003 at 18:32:46, Jeremiah Penery wrote: >> >>>Of course I know that. My point is that with Opteron, even if you are accessing >>>non-local memory *always*, you are not accessing it slower than you would with, >>>say, a traditional SMP machine (2x Xeon, for instance). >>>Of course you can do a lot better - all I'm saying is that there's no way you're >>>going to be doing worse. >>> >>>Either way you win, even with a crappy NUMA algorithm. >> >>I am not so sure. With some NUMA implementations each memory bank has limited >>bandwith, so if you happened to allocate all the critical data in one node's >>memory you'll overload its memory controller. > >>I had seen a case where SMP application was blindly ported to a 32-CPUs NUMA >>system (8 nodes, 4 64-bit CPUs per node, 256Gb RAM total). Application run much >>slower on 32 CPUs than on single CPU. > >I'm not talking about "some NUMA implementations". I'm talking about 2-4 >processor Opteron implementation. It should never have any of the problems you >describe. Indeed, you can see from SPECRate that it scales very nearly as well >as Itanium, and that still with compilers/OS still not very NUMA aware or very >good for AMD64. Sorry, from SPEC Rate I see only that *independent* processes scales well. Any OS with minimal NUMA knowledge will allocate data for each processor in its local memory, thus totally avoiding such problems. And are you sure that NUMA system that for independent processes "scales almost as well" as shared-bus shared-memory system is really good achievement? Thanks, Eugene
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.