Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Cray

Author: Vincent Diepeveen

Date: 05:25:39 07/09/03

Go up one level in this thread


On July 09, 2003 at 01:19:00, Jeremiah Penery wrote:

>On July 09, 2003 at 00:09:03, Vincent Diepeveen wrote:
>
>>On July 08, 2003 at 19:37:48, Jeremiah Penery wrote:
>>
>>>On July 08, 2003 at 08:37:49, Vincent Diepeveen wrote:
>>>
>>>>On July 08, 2003 at 00:33:09, Jeremiah Penery wrote:
>>>>
>>>>>Each chip consumes only about 140W, rather than Vincent's assertion of 150KW.
>>>>
>>>>the 125KW is for Cray 'processors' not fujitsu processors that are in the NEC
>>>>machine.
>>>>
>>>>Ask bob i remember he quoted 500 kilowatt for a 4 processor Cray. So i divided
>>>>that by 4.
>>>
>>>That 500KW was probably for the entire machine.  Each processor probably
>>
>>Yes a 4 processor Cray.
>>
>>Just for your own understanding of what a cray is. it is NOT a processor.
>>It is a big block of electronics put together. So no wonder it eats quite a bit
>>more than the average cpu.
>
>Your own words: "the 125KW is for Cray 'processors'".  But that is not the
>truth.
>
>>Another major difference with Cray machines (using cray processor blocks) is
>>typically not using too many processors, because all processors are cross
>>connected with very fast connections. No clever routing system at all. Brute
>>force.
>
>Earth Simulator:
>
>Each node of 8 processors is connected to 128 IN (Interconnected Network)
>cabinets.  Each of those cabinets is connected to each other processing nodes
>(all 639 other nodes).  Each of these connections is 12.3GB/s bi-directional.
>Each IN cabinet has 2 640x640 crossbar switches to handle this.  "Several
>data-transfer modes, including access to three-dimensional (3D) sub-arrays and
>indirect access modes, are realized in hardware. In an operation that involves
>access to the data of a sub-array, the data is moved from one PN [processor
>node] to another in a single hardware operation..."  So, basically, every
>processor has 1-hop access to every other processor's memory.
>
>I guess that's how the machine sustained over 85% of theoretical peak performace
>on LINPACK, and 66% of theoretical peak on a real-world atmospheric simulation.

All these testsets are not very random latency hungry. In contradiction they
just need big bandwidth and this Earth machine has just that.

Altix3000 for example has 6.4GB biderectional bandwidth to 4 processors. So
that's 12.8. Seems to me that's about the maximum you can get at old hardware.

Of course a vector processor is kicking butt for those testsets but that's what
it was designed for.

This design is superb to simulate nuclear stuff, but i bet they'll be bragging
about other things more as we can see everywhere.

Nevertheless this machine is record breaking and always will be remembered for
that. Assuming it is designed for big vectors it's quite a bit slower in latency
then because if you optimize for huge transfers at once then a single transfer
is probably very pricey.

So let's ignore the latency question, it wasn't designed for it simply.

You don't put 8 processors in a node if you do.

Best regards,
Vincent



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.