Author: Robert Hyatt
Date: 12:57:36 07/09/03
Go up one level in this thread
On July 09, 2003 at 00:09:03, Vincent Diepeveen wrote: >On July 08, 2003 at 19:37:48, Jeremiah Penery wrote: > >>On July 08, 2003 at 08:37:49, Vincent Diepeveen wrote: >> >>>On July 08, 2003 at 00:33:09, Jeremiah Penery wrote: >>> >>>>NEC Earth Simulator has 5120 NEC SX-7(?) vector processors. Total cost was less >>>>than $400m. >>> >>>around $680M it cost. >> >>Provide a reference for that $680m number, and I might believe you. I don't >>accept random numbers without reference. >> >>Less than $400m is quoted at these sites: >>http://www.mindfully.org/Technology/Supercomputer-Japanese23jul02.htm >>http://www.siliconvalley.com/mld/siliconvalley/news/editorial/3709294.htm >>http://www.time.com/time/2002/inventions/rob_earth.html >>http://www-zeuthen.desy.de/~schoene/unter_texte/texte/sc2002/tsld004.htm >>http://www.iht.com/articles/98820.html >>http://cospa.phys.ntu.edu.tw/aapps/v12n2/v12-2n1.pdf >>etc., etc. >> >>The highest price I've seen is around $500m, nowhere near your number. >> >>>>Here is a blurb about the chip, from the webpage: >>>> >>>>"Each AP consists of a 4-way super-scalar unit (SU), a vector unit (VU), and >>>>main memory access control unit on a single LSI chip. The AP operates at a clock >>>>frequency of 500MHz with some circuits operating at 1GHz. Each SU is a >>>>super-scalar processor with 64KB instruction caches, 64KB data caches, and 128 >>>>general-purpose scalar registers. Branch prediction, data prefetching and >>>>out-of-order instruction execution are all employed. Each VU has 72 vector >>>>registers, each of which can has 256 vector elements, along with 8 sets of six >>>>different types of vector pipelines: addition/shifting, multiplication, >>>>division, logical operations, masking, and load/store. The same type of vector >>>>pipelines works together by a single vector instruction and pipelines of >>>>different types can operate concurrently." >>>> >>>>Each chip consumes only about 140W, rather than Vincent's assertion of 150KW. >>> >>>the 125KW is for Cray 'processors' not fujitsu processors that are in the NEC >>>machine. >>> >>>Ask bob i remember he quoted 500 kilowatt for a 4 processor Cray. So i divided >>>that by 4. >> >>That 500KW was probably for the entire machine. Each processor probably > >Yes a 4 processor Cray. > >Just for your own understanding of what a cray is. it is NOT a processor. >It is a big block of electronics put together. So no wonder it eats quite a bit >more than the average cpu. > >That's why i say that those power consuming Crays are history. They are just too >expensive in power imho. If we then compare that they run at 1Ghz and can do >like 29 instructions with 256 KB cache, then it is trivial why those matrix >wonders no longer are a wonder. > >Opterons, Itaniums. You might call them expensive in power. It is trivial that >they are very fast compared to a Cray when you compare the power consumption. > >A special water central was typically used to cool those vector Crays. Bob can >tell more about that. He has had one there at his university. > >>consumes a very small amount of that. The Earth Simulator uses some 7MW of >>power in total, though only about 10% comes from the processors. > >The typical supercomputer has a fast i/o and big routers. Those always eat >trivially more power than the cpu's. > >7 MW nevertheless is hell of a lot. > >From chess viewpoint the only interesting thing is what is the one way pingpong >latency time of the Earth Simulator at the big partitions which work with either >MPI or openmp. Doesn't matter what of course. Of course not from processors near >each other but with some routers in between them ;) > >Another major difference with Cray machines (using cray processor blocks) is >typically not using too many processors, because all processors are cross >connected with very fast connections. No clever routing system at all. Brute >force.\ Pure cross-bar, the best routing there is. > >If you want to make a supercomputer which is having big partitions of cpu's you >need somewhere a compression point where n cpu's compress to a single bottleneck >and then with some kind of router or special designed NUMA flex (that's the very >fast SGI thing where they connect boxes of 64 processors to each other with). > >Cray never accepted such bottlenecks. It was just raw vector power. If you >consider *when* those machines were constructed it was really a genius thing. > >It's only now that cpu's are so very well designed and high clocked with many >instructions a clock that those vector blocks can be replaced safely. > >Note i bet they still get used because most scientist know shit from programming >and you can't blame them. Sorry, but a Cray will blow the doors off of _any_ microcomputer you care to march up. It can sustain a ridiculous number of operations per cycle. IE it is _easy_ on a single CPU to add two 64 bit floats, multiply the sum by another 64 bit float, add that to another 64 bit float. And I can do all of that, two results per clock cycle, _forever_. You have to understand vector processing first, to understand the power of a Cray. Until you grasp that, you are talking nonsense. > >Today i spoke with someone who is running jobs a lot. What he calls a small job >is a calculatoin at 24 processors that runs for 20 hours just doing floating >point calculations. > >His software runs already for like 20 years or so at supercomputers. > >There is however some major differences with today and back then, that's why we >spoke. I had promised him to help him speedup. > >What he is doing is that a processor has huge 3 dimensional arrays where he gets >data from. > >Those are however allocated at the first thread that starts. > >So imagine that 1 poor thread is eating up all that bandwidth of the machine and >that each cache line to get there takes like 5 microseconds or so to arrive. > >Then he can do 16 calculations (cache line length: 128 bytes divided by double >size = 8 bytes). That's sick expensive. > >His software can be speeded up *quite* a lot. > >Trivially he ran also in the past at Crays with this software (nowadays it's in >C, previously it was in fortran). > >They just do not know the bottlenecks of todays supercomputers. > >That's why the Cray for them was a great thing and always they will remember it >for that. > >Because if you got a processor or 16 with shared memory and for every processor >a lookup in that memory is equally fast, then it is trivial that this program, >which definitely is a good example of how many programs still are, can be >speeded up like 20 times easily at this SGI supercomputer. > >Yet the brute force of the Cray doesn't distinguish. So the Cray computer is >even greater if you realize the average guy who has to do calculations on those >machine. > >Up till recently more than 50% of the total system time goes to researchers who >are doing physics (if that's the right english word). Calculation of models and >oil simulations and bunches of known algorithms and unknown new ones that get >tried with major matrixes. False. They are used to design other microprocessors. Apple owns several. They are used for weather forecasting. Simulations. _anything_ that requires incredibly high operations per second on large data arrays. NUMA just doesn't cut it for many such applications, and message-passing is worse. _that_ is the "world of the Crays" and they are untouched there. > >In this case it was field calculations. Most of the researchers are already so >happy that they can run in parallel on a machine that we'll forgive them that >they do some stuff wrong. > >In all cases they draw the conclusion that the cpu is eating up the system time, >because even if your program is 99% busy with calling cache lines from some >remote node, the 'top' is showing that processes are busy 99.xx% of the system >time. > >let's quote Seymour Cray: > "If you were plowing a field, which would you rather use? > Two strong oxen or 1024 chickens?" > >It's trivial that only the best programmers on the planet can go for that 1024 >chickens. > And for a good programmer, those two oxen are going to win the race. > > >>>Trivially Cray machines using the opterons will be consuming less than that. >>>Note that the cpu costs is nothing compared to what the routers etc eat. >> >>Of course.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.