Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty and NUMA

Author: Robert Hyatt

Date: 19:45:04 09/02/03

Go up one level in this thread


On September 02, 2003 at 18:08:40, Vincent Diepeveen wrote:

>On September 02, 2003 at 00:14:02, Jeremiah Penery wrote:
>
>>On September 01, 2003 at 23:23:18, Robert Hyatt wrote:
>>
>>>On September 01, 2003 at 09:39:55, Jeremiah Penery wrote:
>>>
>>>>Any large (multi-node) SMP machine will have the same problem as NUMA with
>>>>respect to inter-node latency.  SMP doesn't magically make node-to-node
>>>>communication any faster.
>>>
>>>Actually it does.  SMP means symmetric.
>>>
>>>NUMA is _not_ symmetric.
>>
>>Of course.  The acronym means "non uniform memory access".
>>
>>But if you think "symmetric" necessarily means "faster", maybe you'd better look
>>in a dictionary.
>
>You're wrong by a factor 2 or so in latency and up to factor 5 for 128 cpu's.
>
>16 processor alpha/sun : 10 mln $
>64 processor itanium2  :  1 mln $
>
>Why would that price difference be like that?
>
>That 64 processor SGI altix3000 thing has the best latency of any cc-NUMA
>machine. It's 2-3 us.
>
>Here is a 8 processor latency run at 8 processor Altix3000 which i ran yesterday
>morning very early. VERY EARLY :)
>
>with just 400MB hash a cpu:
> Average measured read read time at 8 processes = 1039.312012 ns
>
>with just 400MB hash a cpu:
> Average measured read read time at 16 processes = 1207.127808 ns
>
>That is still a very good latency. SGI is superior simply here to other vendors.
>Their cheap cc-NUMA machines are very superior in latency when using low number
>of processors. Note that latencies might have been slightly faster when IRIX
>would run at it instead of linux 2.4.20-sgi extensions enabled kernel. I'm not
>sure though.
>
>But still you see the latest and most modern and newest hardware one can't even
>get under 1 us with latency when using cc-NUMA.
>
>Please consider the hardware. Each brick has 2 duals. Each dual is connected
>with a direct link to that other dual on the brick.
>
>So you can see it kind of like a quad.
>
>At SGI 4 cpu's  latency = 280 ns (measured at TERAS - origin3800).
>At SGI 8 cpu's  latency =   1 us (Altix3000)
>At SGI 16 cpu's latency = 1.2 us (Altix3000)
>
>However 8 cpu shared bus or 16 cpu shared bus the latency will be never worse
>than 600 ns at a modern machine, where for CC-NUMA it goes up and up.

That's wrong.  16 cpus will run into _huge_ latency issues.  The BUS won't
be able to keep up.  That's why nobody uses a BUS on 16-way multiprocessors,
it just doesn't scale that far...  machines beyond 8 cpus generally are
going to be NUMA, or they will be based on a _very_ expensive crossbar
to connect processors and memory.  Not a BUS.


>
>A 512 processor cc-NUMA in fact is only 2 times faster latency than a cluster
>has.


This is why discussions with you go nowhere.  You mix terms.  You redefine
terms.  You make up specification numbers.

There are shared memory machines.  And there are clusters.  Clusters are
_not_ shared memory machines.  In a cluster, nobody talks about memory
latency.  Everybody talks about _network_ latency.  In a NUMA (or crossbar or
BUS) machine, memory latency is mentioned all the time.

But _not_ in a cluster.






> The advantage is that with a cluster you must use MPI library

I have absolutely no idea what you are talking about.  I've been
programming clusters for 20 years, and I didn't "have to use MPI
library".  I did cluster stuff _before_ MPI existed.  Hint:  check
on sockets.  Not to mention PVM.  OpenMP.  UPC from Compaq.  Etc.


> and i'm not
>using it at the great SGI machine. I simply allocate shared memory and
>communicate through that with my own code. You can call it openMP of course, but
>it simply is a low level parallellism.
>
>The big advantage of cc-NUMA is that you can run jobs of say a processor or 32
>with just worst case 2 us latency, under the condition that the OS schedules
>well.

NUMA scales well.  It doesn't perform that great.  NUMA is price-driven.

_not_ performance-driven.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.