Author: Vincent Diepeveen
Date: 21:09:03 07/08/03
Go up one level in this thread
On July 08, 2003 at 19:37:48, Jeremiah Penery wrote: >On July 08, 2003 at 08:37:49, Vincent Diepeveen wrote: > >>On July 08, 2003 at 00:33:09, Jeremiah Penery wrote: >> >>>NEC Earth Simulator has 5120 NEC SX-7(?) vector processors. Total cost was less >>>than $400m. >> >>around $680M it cost. > >Provide a reference for that $680m number, and I might believe you. I don't >accept random numbers without reference. > >Less than $400m is quoted at these sites: >http://www.mindfully.org/Technology/Supercomputer-Japanese23jul02.htm >http://www.siliconvalley.com/mld/siliconvalley/news/editorial/3709294.htm >http://www.time.com/time/2002/inventions/rob_earth.html >http://www-zeuthen.desy.de/~schoene/unter_texte/texte/sc2002/tsld004.htm >http://www.iht.com/articles/98820.html >http://cospa.phys.ntu.edu.tw/aapps/v12n2/v12-2n1.pdf >etc., etc. > >The highest price I've seen is around $500m, nowhere near your number. > >>>Here is a blurb about the chip, from the webpage: >>> >>>"Each AP consists of a 4-way super-scalar unit (SU), a vector unit (VU), and >>>main memory access control unit on a single LSI chip. The AP operates at a clock >>>frequency of 500MHz with some circuits operating at 1GHz. Each SU is a >>>super-scalar processor with 64KB instruction caches, 64KB data caches, and 128 >>>general-purpose scalar registers. Branch prediction, data prefetching and >>>out-of-order instruction execution are all employed. Each VU has 72 vector >>>registers, each of which can has 256 vector elements, along with 8 sets of six >>>different types of vector pipelines: addition/shifting, multiplication, >>>division, logical operations, masking, and load/store. The same type of vector >>>pipelines works together by a single vector instruction and pipelines of >>>different types can operate concurrently." >>> >>>Each chip consumes only about 140W, rather than Vincent's assertion of 150KW. >> >>the 125KW is for Cray 'processors' not fujitsu processors that are in the NEC >>machine. >> >>Ask bob i remember he quoted 500 kilowatt for a 4 processor Cray. So i divided >>that by 4. > >That 500KW was probably for the entire machine. Each processor probably Yes a 4 processor Cray. Just for your own understanding of what a cray is. it is NOT a processor. It is a big block of electronics put together. So no wonder it eats quite a bit more than the average cpu. That's why i say that those power consuming Crays are history. They are just too expensive in power imho. If we then compare that they run at 1Ghz and can do like 29 instructions with 256 KB cache, then it is trivial why those matrix wonders no longer are a wonder. Opterons, Itaniums. You might call them expensive in power. It is trivial that they are very fast compared to a Cray when you compare the power consumption. A special water central was typically used to cool those vector Crays. Bob can tell more about that. He has had one there at his university. >consumes a very small amount of that. The Earth Simulator uses some 7MW of >power in total, though only about 10% comes from the processors. The typical supercomputer has a fast i/o and big routers. Those always eat trivially more power than the cpu's. 7 MW nevertheless is hell of a lot. From chess viewpoint the only interesting thing is what is the one way pingpong latency time of the Earth Simulator at the big partitions which work with either MPI or openmp. Doesn't matter what of course. Of course not from processors near each other but with some routers in between them ;) Another major difference with Cray machines (using cray processor blocks) is typically not using too many processors, because all processors are cross connected with very fast connections. No clever routing system at all. Brute force. If you want to make a supercomputer which is having big partitions of cpu's you need somewhere a compression point where n cpu's compress to a single bottleneck and then with some kind of router or special designed NUMA flex (that's the very fast SGI thing where they connect boxes of 64 processors to each other with). Cray never accepted such bottlenecks. It was just raw vector power. If you consider *when* those machines were constructed it was really a genius thing. It's only now that cpu's are so very well designed and high clocked with many instructions a clock that those vector blocks can be replaced safely. Note i bet they still get used because most scientist know shit from programming and you can't blame them. Today i spoke with someone who is running jobs a lot. What he calls a small job is a calculatoin at 24 processors that runs for 20 hours just doing floating point calculations. His software runs already for like 20 years or so at supercomputers. There is however some major differences with today and back then, that's why we spoke. I had promised him to help him speedup. What he is doing is that a processor has huge 3 dimensional arrays where he gets data from. Those are however allocated at the first thread that starts. So imagine that 1 poor thread is eating up all that bandwidth of the machine and that each cache line to get there takes like 5 microseconds or so to arrive. Then he can do 16 calculations (cache line length: 128 bytes divided by double size = 8 bytes). That's sick expensive. His software can be speeded up *quite* a lot. Trivially he ran also in the past at Crays with this software (nowadays it's in C, previously it was in fortran). They just do not know the bottlenecks of todays supercomputers. That's why the Cray for them was a great thing and always they will remember it for that. Because if you got a processor or 16 with shared memory and for every processor a lookup in that memory is equally fast, then it is trivial that this program, which definitely is a good example of how many programs still are, can be speeded up like 20 times easily at this SGI supercomputer. Yet the brute force of the Cray doesn't distinguish. So the Cray computer is even greater if you realize the average guy who has to do calculations on those machine. Up till recently more than 50% of the total system time goes to researchers who are doing physics (if that's the right english word). Calculation of models and oil simulations and bunches of known algorithms and unknown new ones that get tried with major matrixes. In this case it was field calculations. Most of the researchers are already so happy that they can run in parallel on a machine that we'll forgive them that they do some stuff wrong. In all cases they draw the conclusion that the cpu is eating up the system time, because even if your program is 99% busy with calling cache lines from some remote node, the 'top' is showing that processes are busy 99.xx% of the system time. let's quote Seymour Cray: "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?" It's trivial that only the best programmers on the planet can go for that 1024 chickens. >>Trivially Cray machines using the opterons will be consuming less than that. >>Note that the cpu costs is nothing compared to what the routers etc eat. > >Of course.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.