Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Cray

Author: Robert Hyatt
Date: 17:42:00 07/10/03
On July 10, 2003 at 17:29:02, Keith Evans wrote:

>On July 10, 2003 at 16:36:50, Robert Hyatt wrote:
>
>>On July 09, 2003 at 19:10:01, Vincent Diepeveen wrote:
>>
>>>On July 09, 2003 at 15:57:36, Robert Hyatt wrote:
>>>
>>>>On July 09, 2003 at 00:09:03, Vincent Diepeveen wrote:
>>>>
>>>>>On July 08, 2003 at 19:37:48, Jeremiah Penery wrote:
>>>>>
>>>>>>On July 08, 2003 at 08:37:49, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>On July 08, 2003 at 00:33:09, Jeremiah Penery wrote:
>>>>>>>
>>>>>>>>NEC Earth Simulator has 5120 NEC SX-7(?) vector processors.  Total cost was less
>>>>>>>>than $400m.
>>>>>>>
>>>>>>>around $680M it cost.
>>>>>>
>>>>>>Provide a reference for that $680m number, and I might believe you.  I don't
>>>>>>accept random numbers without reference.
>>>>>>
>>>>>>Less than $400m is quoted at these sites:
>>>>>>http://www.mindfully.org/Technology/Supercomputer-Japanese23jul02.htm
>>>>>>http://www.siliconvalley.com/mld/siliconvalley/news/editorial/3709294.htm
>>>>>>http://www.time.com/time/2002/inventions/rob_earth.html
>>>>>>http://www-zeuthen.desy.de/~schoene/unter_texte/texte/sc2002/tsld004.htm
>>>>>>http://www.iht.com/articles/98820.html
>>>>>>http://cospa.phys.ntu.edu.tw/aapps/v12n2/v12-2n1.pdf
>>>>>>etc., etc.
>>>>>>
>>>>>>The highest price I've seen is around $500m, nowhere near your number.
>>>>>>
>>>>>>>>Here is a blurb about the chip, from the webpage:
>>>>>>>>
>>>>>>>>"Each AP consists of a 4-way super-scalar unit (SU), a vector unit (VU), and
>>>>>>>>main memory access control unit on a single LSI chip. The AP operates at a clock
>>>>>>>>frequency of 500MHz with some circuits operating at 1GHz. Each SU is a
>>>>>>>>super-scalar processor with 64KB instruction caches, 64KB data caches, and 128
>>>>>>>>general-purpose scalar registers. Branch prediction, data prefetching and
>>>>>>>>out-of-order instruction execution are all employed. Each VU has 72 vector
>>>>>>>>registers, each of which can has 256 vector elements, along with 8 sets of six
>>>>>>>>different types of vector pipelines: addition/shifting, multiplication,
>>>>>>>>division, logical operations, masking, and load/store. The same type of vector
>>>>>>>>pipelines works together by a single vector instruction and pipelines of
>>>>>>>>different types can operate concurrently."
>>>>>>>>
>>>>>>>>Each chip consumes only about 140W, rather than Vincent's assertion of 150KW.
>>>>>>>
>>>>>>>the 125KW is for Cray 'processors' not fujitsu processors that are in the NEC
>>>>>>>machine.
>>>>>>>
>>>>>>>Ask bob i remember he quoted 500 kilowatt for a 4 processor Cray. So i divided
>>>>>>>that by 4.
>>>>>>
>>>>>>That 500KW was probably for the entire machine.  Each processor probably
>>>>>
>>>>>Yes a 4 processor Cray.
>>>>>
>>>>>Just for your own understanding of what a cray is. it is NOT a processor.
>>>>>It is a big block of electronics put together. So no wonder it eats quite a bit
>>>>>more than the average cpu.
>>>>>
>>>>>That's why i say that those power consuming Crays are history. They are just too
>>>>>expensive in power imho. If we then compare that they run at 1Ghz and can do
>>>>>like 29 instructions with 256 KB cache, then it is trivial why those matrix
>>>>>wonders no longer are a wonder.
>>>>>
>>>>>Opterons, Itaniums. You might call them expensive in power. It is trivial that
>>>>>they are very fast compared to a Cray when you compare the power consumption.
>>>>>
>>>>>A special water central was typically used to cool those vector Crays. Bob can
>>>>>tell more about that. He has had one there at his university.
>>>>>
>>>>>>consumes a very small amount of that.  The Earth Simulator uses some 7MW of
>>>>>>power in total, though only about 10% comes from the processors.
>>>>>
>>>>>The typical supercomputer has a fast i/o and big routers. Those always eat
>>>>>trivially more power than the cpu's.
>>>>>
>>>>>7 MW nevertheless is hell of a lot.
>>>>>
>>>>>From chess viewpoint the only interesting thing is what is the one way pingpong
>>>>>latency time of the Earth Simulator at the big partitions which work with either
>>>>>MPI or openmp. Doesn't matter what of course. Of course not from processors near
>>>>>each other but with some routers in between them ;)
>>>>>
>>>>>Another major difference with Cray machines (using cray processor blocks) is
>>>>>typically not using too many processors, because all processors are cross
>>>>>connected with very fast connections. No clever routing system at all. Brute
>>>>>force.\
>>>>
>>>>Pure cross-bar, the best routing there is.
>>>>
>>>>
>>>>>
>>>>>If you want to make a supercomputer which is having big partitions of cpu's you
>>>>>need somewhere a compression point where n cpu's compress to a single bottleneck
>>>>>and then with some kind of router or special designed NUMA flex (that's the very
>>>>>fast SGI thing where they connect boxes of 64 processors to each other with).
>>>>>
>>>>>Cray never accepted such bottlenecks. It was just raw vector power. If you
>>>>>consider *when* those machines were constructed it was really a genius thing.
>>>>>
>>>>>It's only now that cpu's are so very well designed and high clocked with many
>>>>>instructions a clock that those vector blocks can be replaced safely.
>>>>>
>>>>>Note i bet they still get used because most scientist know shit from programming
>>>>>and you can't blame them.
>>>>
>>>>Sorry, but a Cray will blow the doors off of _any_ microcomputer you care to
>>>>march up.  It can sustain a ridiculous number of operations per cycle.  IE it
>>>
>>>Gotta love your comparisions :)
>>>
>>>You show up with a cray supercomputer and i may only bring something my hands
>>>can carry :)
>>
>>Feel free to do so.  I'll take a T932 over _anything_ you can carry by
>>hand, no questions asked.
>>
>>
>>>
>>>I would prefer to show up with the nowadays 1440 processor and 3 gflops teras
>>>though :)
>>>
>>>>is _easy_ on a single CPU to add two 64 bit floats, multiply the sum by
>>>>another 64 bit float, add that to another 64 bit float.  And I can do all of
>>>>that, two results per clock cycle, _forever_.
>>>>
>>>>You have to understand vector processing first, to understand the power of a
>>>>Cray.  Until you grasp that, you are talking nonsense.
>>>
>>>>>
>>>>>Today i spoke with someone who is running jobs a lot. What he calls a small job
>>>>>is a calculatoin at 24 processors that runs for 20 hours just doing floating
>>>>>point calculations.
>>>>>
>>>>>His software runs already for like 20 years or so at supercomputers.
>>>>>
>>>>>There is however some major differences with today and back then, that's why we
>>>>>spoke. I had promised him to help him speedup.
>>>>>
>>>>>What he is doing is that a processor has huge 3 dimensional arrays where he gets
>>>>>data from.
>>>>>
>>>>>Those are however allocated at the first thread that starts.
>>>>>
>>>>>So imagine that 1 poor thread is eating up all that bandwidth of the machine and
>>>>>that each cache line to get there takes like 5 microseconds or so to arrive.
>>>>>
>>>>>Then he can do 16 calculations (cache line length: 128 bytes divided by double
>>>>>size = 8 bytes). That's sick expensive.
>>>>>
>>>>>His software can be speeded up *quite* a lot.
>>>>>
>>>>>Trivially he ran also in the past at Crays with this software (nowadays it's in
>>>>>C, previously it was in fortran).
>>>>>
>>>>>They just do not know the bottlenecks of todays supercomputers.
>>>>>
>>>>>That's why the Cray for them was a great thing and always they will remember it
>>>>>for that.
>>>>>
>>>>>Because if you got a processor or 16 with shared memory and for every processor
>>>>>a lookup in that memory is equally fast, then it is trivial that this program,
>>>>>which definitely is a good example of how many programs still are, can be
>>>>>speeded up like 20 times easily at this SGI supercomputer.
>>>>>
>>>>>Yet the brute force of the Cray doesn't distinguish. So the Cray computer is
>>>>>even greater if you realize the average guy who has to do calculations on those
>>>>>machine.
>>>>>
>>>>>Up till recently more than 50% of the total system time goes to researchers who
>>>>>are doing physics (if that's the right english word). Calculation of models and
>>>>>oil simulations and bunches of known algorithms and unknown new ones that get
>>>>>tried with major matrixes.
>>>>
>>>>False.  They are used to design other microprocessors.  Apple owns several.
>>>>They are used for weather forecasting.  Simulations.  _anything_ that requires
>>>>incredibly high operations per second on large data arrays.  NUMA just doesn't
>>>>cut it for many such applications, and message-passing is worse.
>>>>_that_ is the "world of the Crays" and they are untouched there.
>>>
>>>I'm not sure about the microprocessor designs, we can ask AMD and intel after
>>>it. Apple doesn't produce microprocessors at all. They use IBM processors
>>>nowadays and before IBM they used Motorola.
>>
>>Apple produces _machines_.  They do circuit layout and testing on a Cray.
>
>Why would anyone use a Cray for circuit layout?

I don't think they did the layout.  I think this was all simulations to test
a design before building it.  When I said layout, I should have left that
out.  They had a big simulator that would take a complete design and run it
through all sorts of timing simulations.  I don't know that they _still_ do
this.  But there were several papers published years ago about their buying
and using four 4-processor Cray XMP boxes for this.


>
>Look at what NVidia uses to design their huge chips - I don't see any Crays
>there. Do Synopsys, Cadence,... support Crays?
>
>As far as simulation acceleration goes, I think that Xilinx Virtex2 parts are
>better than a Cray.
>
>From http://www.clock.org/~fair/computers/sgi-cray.html
>
>'Apple Computer bought a Cray X/MP-48 (four 9ns clock cycle processors, eight
>megawords of RAM) to help design a supercomputer on a chip. The project, alas,
>failed.
>
>A probably apocryphal story: John Scully met Seymour Cray, and told Seymour,
>"You know, we're using a Cray to design the next Macintosh." Seymour scratched
>his head and thoughtfully replied, "Well, that's funny - I'm using a Macintosh
>to design the next Cray."

I've heard both of those.  :)


>
>Apple's Cray subsequently found a useful life doing plastic flow modelling for
>the injection molds that Apple used for the cases of its products (the Cray cut
>months off the time to produce a production-quality plastic mold tool, and saved
>hundreds of thousands of dollars a shot). It was also a symbol of Apple
>Computer's commitment to having a world-class R&D facility, which served to
>attract many superior computing researchers over the years.
>
>They've all been laid off now, of course.'
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.