Author: Vincent Diepeveen
Date: 05:25:39 07/09/03
Go up one level in this thread
On July 09, 2003 at 01:19:00, Jeremiah Penery wrote: >On July 09, 2003 at 00:09:03, Vincent Diepeveen wrote: > >>On July 08, 2003 at 19:37:48, Jeremiah Penery wrote: >> >>>On July 08, 2003 at 08:37:49, Vincent Diepeveen wrote: >>> >>>>On July 08, 2003 at 00:33:09, Jeremiah Penery wrote: >>>> >>>>>Each chip consumes only about 140W, rather than Vincent's assertion of 150KW. >>>> >>>>the 125KW is for Cray 'processors' not fujitsu processors that are in the NEC >>>>machine. >>>> >>>>Ask bob i remember he quoted 500 kilowatt for a 4 processor Cray. So i divided >>>>that by 4. >>> >>>That 500KW was probably for the entire machine. Each processor probably >> >>Yes a 4 processor Cray. >> >>Just for your own understanding of what a cray is. it is NOT a processor. >>It is a big block of electronics put together. So no wonder it eats quite a bit >>more than the average cpu. > >Your own words: "the 125KW is for Cray 'processors'". But that is not the >truth. > >>Another major difference with Cray machines (using cray processor blocks) is >>typically not using too many processors, because all processors are cross >>connected with very fast connections. No clever routing system at all. Brute >>force. > >Earth Simulator: > >Each node of 8 processors is connected to 128 IN (Interconnected Network) >cabinets. Each of those cabinets is connected to each other processing nodes >(all 639 other nodes). Each of these connections is 12.3GB/s bi-directional. >Each IN cabinet has 2 640x640 crossbar switches to handle this. "Several >data-transfer modes, including access to three-dimensional (3D) sub-arrays and >indirect access modes, are realized in hardware. In an operation that involves >access to the data of a sub-array, the data is moved from one PN [processor >node] to another in a single hardware operation..." So, basically, every >processor has 1-hop access to every other processor's memory. > >I guess that's how the machine sustained over 85% of theoretical peak performace >on LINPACK, and 66% of theoretical peak on a real-world atmospheric simulation. All these testsets are not very random latency hungry. In contradiction they just need big bandwidth and this Earth machine has just that. Altix3000 for example has 6.4GB biderectional bandwidth to 4 processors. So that's 12.8. Seems to me that's about the maximum you can get at old hardware. Of course a vector processor is kicking butt for those testsets but that's what it was designed for. This design is superb to simulate nuclear stuff, but i bet they'll be bragging about other things more as we can see everywhere. Nevertheless this machine is record breaking and always will be remembered for that. Assuming it is designed for big vectors it's quite a bit slower in latency then because if you optimize for huge transfers at once then a single transfer is probably very pricey. So let's ignore the latency question, it wasn't designed for it simply. You don't put 8 processors in a node if you do. Best regards, Vincent
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.