Author: Robert Hyatt
Date: 19:45:04 09/02/03
Go up one level in this thread
On September 02, 2003 at 18:08:40, Vincent Diepeveen wrote: >On September 02, 2003 at 00:14:02, Jeremiah Penery wrote: > >>On September 01, 2003 at 23:23:18, Robert Hyatt wrote: >> >>>On September 01, 2003 at 09:39:55, Jeremiah Penery wrote: >>> >>>>Any large (multi-node) SMP machine will have the same problem as NUMA with >>>>respect to inter-node latency. SMP doesn't magically make node-to-node >>>>communication any faster. >>> >>>Actually it does. SMP means symmetric. >>> >>>NUMA is _not_ symmetric. >> >>Of course. The acronym means "non uniform memory access". >> >>But if you think "symmetric" necessarily means "faster", maybe you'd better look >>in a dictionary. > >You're wrong by a factor 2 or so in latency and up to factor 5 for 128 cpu's. > >16 processor alpha/sun : 10 mln $ >64 processor itanium2 : 1 mln $ > >Why would that price difference be like that? > >That 64 processor SGI altix3000 thing has the best latency of any cc-NUMA >machine. It's 2-3 us. > >Here is a 8 processor latency run at 8 processor Altix3000 which i ran yesterday >morning very early. VERY EARLY :) > >with just 400MB hash a cpu: > Average measured read read time at 8 processes = 1039.312012 ns > >with just 400MB hash a cpu: > Average measured read read time at 16 processes = 1207.127808 ns > >That is still a very good latency. SGI is superior simply here to other vendors. >Their cheap cc-NUMA machines are very superior in latency when using low number >of processors. Note that latencies might have been slightly faster when IRIX >would run at it instead of linux 2.4.20-sgi extensions enabled kernel. I'm not >sure though. > >But still you see the latest and most modern and newest hardware one can't even >get under 1 us with latency when using cc-NUMA. > >Please consider the hardware. Each brick has 2 duals. Each dual is connected >with a direct link to that other dual on the brick. > >So you can see it kind of like a quad. > >At SGI 4 cpu's latency = 280 ns (measured at TERAS - origin3800). >At SGI 8 cpu's latency = 1 us (Altix3000) >At SGI 16 cpu's latency = 1.2 us (Altix3000) > >However 8 cpu shared bus or 16 cpu shared bus the latency will be never worse >than 600 ns at a modern machine, where for CC-NUMA it goes up and up. That's wrong. 16 cpus will run into _huge_ latency issues. The BUS won't be able to keep up. That's why nobody uses a BUS on 16-way multiprocessors, it just doesn't scale that far... machines beyond 8 cpus generally are going to be NUMA, or they will be based on a _very_ expensive crossbar to connect processors and memory. Not a BUS. > >A 512 processor cc-NUMA in fact is only 2 times faster latency than a cluster >has. This is why discussions with you go nowhere. You mix terms. You redefine terms. You make up specification numbers. There are shared memory machines. And there are clusters. Clusters are _not_ shared memory machines. In a cluster, nobody talks about memory latency. Everybody talks about _network_ latency. In a NUMA (or crossbar or BUS) machine, memory latency is mentioned all the time. But _not_ in a cluster. > The advantage is that with a cluster you must use MPI library I have absolutely no idea what you are talking about. I've been programming clusters for 20 years, and I didn't "have to use MPI library". I did cluster stuff _before_ MPI existed. Hint: check on sockets. Not to mention PVM. OpenMP. UPC from Compaq. Etc. > and i'm not >using it at the great SGI machine. I simply allocate shared memory and >communicate through that with my own code. You can call it openMP of course, but >it simply is a low level parallellism. > >The big advantage of cc-NUMA is that you can run jobs of say a processor or 32 >with just worst case 2 us latency, under the condition that the OS schedules >well. NUMA scales well. It doesn't perform that great. NUMA is price-driven. _not_ performance-driven.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.