Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Cray

Author: Vincent Diepeveen
Date: 05:27:03 07/09/03
On July 09, 2003 at 08:25:39, Vincent Diepeveen wrote:

>On July 09, 2003 at 01:19:00, Jeremiah Penery wrote:
>
>>On July 09, 2003 at 00:09:03, Vincent Diepeveen wrote:
>>
>>>On July 08, 2003 at 19:37:48, Jeremiah Penery wrote:
>>>
>>>>On July 08, 2003 at 08:37:49, Vincent Diepeveen wrote:
>>>>
>>>>>On July 08, 2003 at 00:33:09, Jeremiah Penery wrote:
>>>>>
>>>>>>Each chip consumes only about 140W, rather than Vincent's assertion of 150KW.
>>>>>
>>>>>the 125KW is for Cray 'processors' not fujitsu processors that are in the NEC
>>>>>machine.
>>>>>
>>>>>Ask bob i remember he quoted 500 kilowatt for a 4 processor Cray. So i divided
>>>>>that by 4.
>>>>
>>>>That 500KW was probably for the entire machine.  Each processor probably
>>>
>>>Yes a 4 processor Cray.
>>>
>>>Just for your own understanding of what a cray is. it is NOT a processor.
>>>It is a big block of electronics put together. So no wonder it eats quite a bit
>>>more than the average cpu.
>>
>>Your own words: "the 125KW is for Cray 'processors'".  But that is not the
>>truth.
>>
>>>Another major difference with Cray machines (using cray processor blocks) is
>>>typically not using too many processors, because all processors are cross
>>>connected with very fast connections. No clever routing system at all. Brute
>>>force.
>>
>>Earth Simulator:
>>
>>Each node of 8 processors is connected to 128 IN (Interconnected Network)
>>cabinets.  Each of those cabinets is connected to each other processing nodes
>>(all 639 other nodes).  Each of these connections is 12.3GB/s bi-directional.
>>Each IN cabinet has 2 640x640 crossbar switches to handle this.  "Several
>>data-transfer modes, including access to three-dimensional (3D) sub-arrays and
>>indirect access modes, are realized in hardware. In an operation that involves
>>access to the data of a sub-array, the data is moved from one PN [processor
>>node] to another in a single hardware operation..."  So, basically, every
>>processor has 1-hop access to every other processor's memory.
>>
>>I guess that's how the machine sustained over 85% of theoretical peak performace
>>on LINPACK, and 66% of theoretical peak on a real-world atmospheric simulation.
>
>All these testsets are not very random latency hungry. In contradiction they
>just need big bandwidth and this Earth machine has just that.
>
>Altix3000 for example has 6.4GB biderectional bandwidth to 4 processors. So
>that's 12.8. Seems to me that's about the maximum you can get at old hardware.

New is the hypertransport of course.

>Of course a vector processor is kicking butt for those testsets but that's what
>it was designed for.
>
>This design is superb to simulate nuclear stuff, but i bet they'll be bragging
>about other things more as we can see everywhere.
>
>Nevertheless this machine is record breaking and always will be remembered for
>that. Assuming it is designed for big vectors it's quite a bit slower in latency
>then because if you optimize for huge transfers at once then a single transfer
>is probably very pricey.
>
>So let's ignore the latency question, it wasn't designed for it simply.
>
>You don't put 8 processors in a node if you do.
>
>Best regards,
>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.