Author: Keith Evans
Date: 14:29:02 07/10/03
Go up one level in this thread
On July 10, 2003 at 16:36:50, Robert Hyatt wrote: >On July 09, 2003 at 19:10:01, Vincent Diepeveen wrote: > >>On July 09, 2003 at 15:57:36, Robert Hyatt wrote: >> >>>On July 09, 2003 at 00:09:03, Vincent Diepeveen wrote: >>> >>>>On July 08, 2003 at 19:37:48, Jeremiah Penery wrote: >>>> >>>>>On July 08, 2003 at 08:37:49, Vincent Diepeveen wrote: >>>>> >>>>>>On July 08, 2003 at 00:33:09, Jeremiah Penery wrote: >>>>>> >>>>>>>NEC Earth Simulator has 5120 NEC SX-7(?) vector processors. Total cost was less >>>>>>>than $400m. >>>>>> >>>>>>around $680M it cost. >>>>> >>>>>Provide a reference for that $680m number, and I might believe you. I don't >>>>>accept random numbers without reference. >>>>> >>>>>Less than $400m is quoted at these sites: >>>>>http://www.mindfully.org/Technology/Supercomputer-Japanese23jul02.htm >>>>>http://www.siliconvalley.com/mld/siliconvalley/news/editorial/3709294.htm >>>>>http://www.time.com/time/2002/inventions/rob_earth.html >>>>>http://www-zeuthen.desy.de/~schoene/unter_texte/texte/sc2002/tsld004.htm >>>>>http://www.iht.com/articles/98820.html >>>>>http://cospa.phys.ntu.edu.tw/aapps/v12n2/v12-2n1.pdf >>>>>etc., etc. >>>>> >>>>>The highest price I've seen is around $500m, nowhere near your number. >>>>> >>>>>>>Here is a blurb about the chip, from the webpage: >>>>>>> >>>>>>>"Each AP consists of a 4-way super-scalar unit (SU), a vector unit (VU), and >>>>>>>main memory access control unit on a single LSI chip. The AP operates at a clock >>>>>>>frequency of 500MHz with some circuits operating at 1GHz. Each SU is a >>>>>>>super-scalar processor with 64KB instruction caches, 64KB data caches, and 128 >>>>>>>general-purpose scalar registers. Branch prediction, data prefetching and >>>>>>>out-of-order instruction execution are all employed. Each VU has 72 vector >>>>>>>registers, each of which can has 256 vector elements, along with 8 sets of six >>>>>>>different types of vector pipelines: addition/shifting, multiplication, >>>>>>>division, logical operations, masking, and load/store. The same type of vector >>>>>>>pipelines works together by a single vector instruction and pipelines of >>>>>>>different types can operate concurrently." >>>>>>> >>>>>>>Each chip consumes only about 140W, rather than Vincent's assertion of 150KW. >>>>>> >>>>>>the 125KW is for Cray 'processors' not fujitsu processors that are in the NEC >>>>>>machine. >>>>>> >>>>>>Ask bob i remember he quoted 500 kilowatt for a 4 processor Cray. So i divided >>>>>>that by 4. >>>>> >>>>>That 500KW was probably for the entire machine. Each processor probably >>>> >>>>Yes a 4 processor Cray. >>>> >>>>Just for your own understanding of what a cray is. it is NOT a processor. >>>>It is a big block of electronics put together. So no wonder it eats quite a bit >>>>more than the average cpu. >>>> >>>>That's why i say that those power consuming Crays are history. They are just too >>>>expensive in power imho. If we then compare that they run at 1Ghz and can do >>>>like 29 instructions with 256 KB cache, then it is trivial why those matrix >>>>wonders no longer are a wonder. >>>> >>>>Opterons, Itaniums. You might call them expensive in power. It is trivial that >>>>they are very fast compared to a Cray when you compare the power consumption. >>>> >>>>A special water central was typically used to cool those vector Crays. Bob can >>>>tell more about that. He has had one there at his university. >>>> >>>>>consumes a very small amount of that. The Earth Simulator uses some 7MW of >>>>>power in total, though only about 10% comes from the processors. >>>> >>>>The typical supercomputer has a fast i/o and big routers. Those always eat >>>>trivially more power than the cpu's. >>>> >>>>7 MW nevertheless is hell of a lot. >>>> >>>>From chess viewpoint the only interesting thing is what is the one way pingpong >>>>latency time of the Earth Simulator at the big partitions which work with either >>>>MPI or openmp. Doesn't matter what of course. Of course not from processors near >>>>each other but with some routers in between them ;) >>>> >>>>Another major difference with Cray machines (using cray processor blocks) is >>>>typically not using too many processors, because all processors are cross >>>>connected with very fast connections. No clever routing system at all. Brute >>>>force.\ >>> >>>Pure cross-bar, the best routing there is. >>> >>> >>>> >>>>If you want to make a supercomputer which is having big partitions of cpu's you >>>>need somewhere a compression point where n cpu's compress to a single bottleneck >>>>and then with some kind of router or special designed NUMA flex (that's the very >>>>fast SGI thing where they connect boxes of 64 processors to each other with). >>>> >>>>Cray never accepted such bottlenecks. It was just raw vector power. If you >>>>consider *when* those machines were constructed it was really a genius thing. >>>> >>>>It's only now that cpu's are so very well designed and high clocked with many >>>>instructions a clock that those vector blocks can be replaced safely. >>>> >>>>Note i bet they still get used because most scientist know shit from programming >>>>and you can't blame them. >>> >>>Sorry, but a Cray will blow the doors off of _any_ microcomputer you care to >>>march up. It can sustain a ridiculous number of operations per cycle. IE it >> >>Gotta love your comparisions :) >> >>You show up with a cray supercomputer and i may only bring something my hands >>can carry :) > >Feel free to do so. I'll take a T932 over _anything_ you can carry by >hand, no questions asked. > > >> >>I would prefer to show up with the nowadays 1440 processor and 3 gflops teras >>though :) >> >>>is _easy_ on a single CPU to add two 64 bit floats, multiply the sum by >>>another 64 bit float, add that to another 64 bit float. And I can do all of >>>that, two results per clock cycle, _forever_. >>> >>>You have to understand vector processing first, to understand the power of a >>>Cray. Until you grasp that, you are talking nonsense. >> >>>> >>>>Today i spoke with someone who is running jobs a lot. What he calls a small job >>>>is a calculatoin at 24 processors that runs for 20 hours just doing floating >>>>point calculations. >>>> >>>>His software runs already for like 20 years or so at supercomputers. >>>> >>>>There is however some major differences with today and back then, that's why we >>>>spoke. I had promised him to help him speedup. >>>> >>>>What he is doing is that a processor has huge 3 dimensional arrays where he gets >>>>data from. >>>> >>>>Those are however allocated at the first thread that starts. >>>> >>>>So imagine that 1 poor thread is eating up all that bandwidth of the machine and >>>>that each cache line to get there takes like 5 microseconds or so to arrive. >>>> >>>>Then he can do 16 calculations (cache line length: 128 bytes divided by double >>>>size = 8 bytes). That's sick expensive. >>>> >>>>His software can be speeded up *quite* a lot. >>>> >>>>Trivially he ran also in the past at Crays with this software (nowadays it's in >>>>C, previously it was in fortran). >>>> >>>>They just do not know the bottlenecks of todays supercomputers. >>>> >>>>That's why the Cray for them was a great thing and always they will remember it >>>>for that. >>>> >>>>Because if you got a processor or 16 with shared memory and for every processor >>>>a lookup in that memory is equally fast, then it is trivial that this program, >>>>which definitely is a good example of how many programs still are, can be >>>>speeded up like 20 times easily at this SGI supercomputer. >>>> >>>>Yet the brute force of the Cray doesn't distinguish. So the Cray computer is >>>>even greater if you realize the average guy who has to do calculations on those >>>>machine. >>>> >>>>Up till recently more than 50% of the total system time goes to researchers who >>>>are doing physics (if that's the right english word). Calculation of models and >>>>oil simulations and bunches of known algorithms and unknown new ones that get >>>>tried with major matrixes. >>> >>>False. They are used to design other microprocessors. Apple owns several. >>>They are used for weather forecasting. Simulations. _anything_ that requires >>>incredibly high operations per second on large data arrays. NUMA just doesn't >>>cut it for many such applications, and message-passing is worse. >>>_that_ is the "world of the Crays" and they are untouched there. >> >>I'm not sure about the microprocessor designs, we can ask AMD and intel after >>it. Apple doesn't produce microprocessors at all. They use IBM processors >>nowadays and before IBM they used Motorola. > >Apple produces _machines_. They do circuit layout and testing on a Cray. Why would anyone use a Cray for circuit layout? Look at what NVidia uses to design their huge chips - I don't see any Crays there. Do Synopsys, Cadence,... support Crays? As far as simulation acceleration goes, I think that Xilinx Virtex2 parts are better than a Cray. From http://www.clock.org/~fair/computers/sgi-cray.html 'Apple Computer bought a Cray X/MP-48 (four 9ns clock cycle processors, eight megawords of RAM) to help design a supercomputer on a chip. The project, alas, failed. A probably apocryphal story: John Scully met Seymour Cray, and told Seymour, "You know, we're using a Cray to design the next Macintosh." Seymour scratched his head and thoughtfully replied, "Well, that's funny - I'm using a Macintosh to design the next Cray." Apple's Cray subsequently found a useful life doing plastic flow modelling for the injection molds that Apple used for the cases of its products (the Cray cut months off the time to produce a production-quality plastic mold tool, and saved hundreds of thousands of dollars a shot). It was also a symbol of Apple Computer's commitment to having a world-class R&D facility, which served to attract many superior computing researchers over the years. They've all been laid off now, of course.'
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.