Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Correction hydra hardware

Author: Robert Hyatt
Date: 11:02:56 02/03/05
On February 03, 2005 at 13:20:58, Matthew Hull wrote:

>On February 03, 2005 at 11:59:11, Robert Hyatt wrote:
>
>>On February 03, 2005 at 11:17:31, Vincent Diepeveen wrote:
>>
>>>On February 02, 2005 at 11:54:46, Robert Hyatt wrote:
>>>
>>>>On February 02, 2005 at 09:53:03, Vincent Diepeveen wrote:
>>>>
>>>>>On February 01, 2005 at 21:51:19, Robert Hyatt wrote:
>>>>>
>>>>>>On February 01, 2005 at 17:19:26, Vincent Diepeveen wrote:
>>>>>>
>>>>>>>On February 01, 2005 at 16:28:22, Robert Hyatt wrote:
>>>>>>>
>>>>>>>Still didn't read the subject title?
>>>>>>>
>>>>>>>[snip]
>>>>>>>
>>>>>>>>Because a cluster can't offer 1/100th the total memory bandwidth of a big Cray
>>>>>>>>vector box.
>>>>>>>
>>>>>>>Actually todays clusters deliver a factor 1000 more or so.
>>>>>>>
>>>>>>>Total bandwidth a cluster can deliver is measured nowadays in Terabytes per
>>>>>>>second, with Cray it was measured in gigabytes per second.
>>>>>>
>>>>>>Let's see.  The last Cray I ran on with a chess program was a T932.  Processor
>>>>>>could read 4 words and write two words per cycle, cycle time was 2ns.  So 6
>>>>>>words, 48 bytes per cycle, x 500M cycles per second is about 2.5 gigabytes per
>>>>>>second, x 32 processors is getting dangerously close to 100 gigabytes per
>>>>>
>>>>>Bandwidth a cpu at the old MIPS was 3.2 GB/s from memory (origin3000 series)
>>>>>and bandwidth at altix3000 using network4 a cpu is 8.2 gigabyte per second from
>>>>>memory.
>>>>>
>>>>>So what Cray streamed there was impressive for its days, but it delivered to
>>>>>just a few cpu's, that was the entire main problem. This for massive power
>>>>>consumption.
>>>>>
>>>>>What we speak of now is that you get effectively the same bandwidth from memory
>>>>>to each cpu now, but systems go up to 130000+ processors.
>>>>>
>>>>>>second.  A "cluster" can have more theoretical bandwidth, but rarely as much
>>>>>>_real_ bandwidth.  This is on a shared memory machine that can do real codes
>>>>>>quite well.
>>>>>
>>>>>>>
>>>>>>>Note it's the same network that gets used for huge Cray T3E's, but a newer and
>>>>>>>bigger version, that's all.
>>>>>>
>>>>>>T3E isn't a vector computer.
>>>>>
>>>>>The processor used (alpha) was out of order, yet achieves the same main
>>>>>objective, that's executing more than 1 instruction a cycle effectively.
>>>>>
>>>>>Itanium2 is objectively seen is a vector processor as it executes 2 bundles at
>>>>>once. Though they call that IPF nowadays.
>>>>
>>>>That's not a vector architecture.  A vector machine executes _one_ instruction
>>>>and produces a large number of results.  For example, current cray vector boxes
>>>>can produce 128 results by executing a single instruction.  That is why MIPS was
>>>>dropped and FLOPS became the measure for high-performance computing.
>>>
>>>IPF executes 2 bundles per cycle.
>>>
>>>1 bundle = 3 instructions in IPF
>>>
>>>You can see that as a vector.
>>
>>Maybe _you_ can see that as a vector.  No person familiar with computer
>>architecture sees that as a vector.  No architecture textbook calls that a
>>vector.  I stick to common definitions of words, not your privately twisted
>>definitions that nobody can communicate with.
>>
>>Pick up a copy of Hennessy/Patterson's architecture book and look up "vector
>>operations" in the index.  VLIW is _not_ vector.  "bundles" are _not_ vector.
>>
>>>
>>>>
>>>>>
>>>>>All x86-64 which are taking over now are doing 3 instructions a cycle now and
>>>>>deliver up to 2 flops a cycle.
>>>>>
>>>>>>>Crays had usually when in vector like what was is 4 cpu's or so? Sometimes up to
>>>>>>>128. Above that it was T3E which had alpha's.
>>>>>>>
>>>>>>>that one used quadrics usually :)
>>>>>>>
>>>>>>>However look to France now. New great supercomputer. 8192 processors or so.
>>>>>>>Say 2048 nodes. You're looking at 3.6 TB per second bandwidth :)
>>>>>
>>>>>>For a synthetic benchmark, not a real code, that's the problem with clusters so
>>>>>>far...
>>>>>
>>>>>It's the speed the memory delivers to the cpu's.
>>>>>
>>>>>Nothing synthetic.
>>>>
>>>>Didn't think you would understand that.  It is about "theoretical peak" vs
>>>>"sustained peak for real applications". The numbers are not the same.
>>>
>>>I know you will act innocent here. But you just have no idea of course.
>>>
>>>A 'cheapo' highend network card can get 800Mb/s.
>>
>>Not if two machines try to talk to the same node it can't.  Not if there is
>>congestion in the router it can't.  Not if there are multiple router hops
>>between the two points it can't.  Etc...
>>
>>>
>>>>>
>>>>>>>Those Crays you remember were 100Mhz ones. Network could deliver of course
>>>>>>>exactly what cpu could calculate.
>>>>>>
>>>>>>There was no "network" and the crays were 500mhz although on a fully pipelined
>>>>>>vector machine that can do 5-10 operations per cycle that is not exactly a good
>>>>>>measure of performance.
>>>>>
>>>>>Operations doesn't count. Flops do.
>>>>>
>>>>>There is actually 1Ghz Cray here with 256KB cache.
>>>>
>>>>DOn't know what cray you have there, but the cache is not normal cache.  It is
>>>>only for "scalar" operations.  Vector operations don't go through the scalar
>>>>cache on a Cray, because all memory reads and writes are pipelined and deliver
>>>>values every clock cycle after the latency delay for the first word.
>>>
>>>www.cray.com
>>
>>What should I look for?  I have the manuals for every vector machine they have
>>ever produced/shipped in my office.
>>
>>
>>>
>>>>>
>>>>>>>Not so great if you look to the total number of Gflop it delivered. Nowadays the
>>>>>>>big clusters, as all big supercomputers nowadays are clusters, are measured in
>>>>>>>Tflop and one already in Pflop :)
>>>>>>>
>>>>>>>There is a 0.36 Pflop one now under construction :)
>>>>>>>
>>>>>>>Vincent
>>>>>
>>>>>>Different computer for different applications.  Ask a real programmer which he
>>>>>>would rather write code for...
>>>>>
>>>>>They'll all pick the fastest machine, which is the one delivering most flops.
>>>>>
>>>>>Vincent
>>>>
>>>>No.  Otherwise there would be no machines like the Cray, Fujitsu, Hitachi, etc
>>>>left.  Using a large number of processors in a message-passing architecture is
>>>>not as easy as using a small number of faster processors in a shared-memory
>>>>architecture, for many applications.
>>>
>>>You confuse network cards with network cards. There is also network cards that
>>>allow DSM. See www.quadrics.com for details. They have RAM on chip. No need for
>>>message passing. You can read straight over the network remote from that RAM
>>>without needing a message in the remote cpu to be handled.
>>
>>Please get into the world of standard definitions.  Does not matter whether the
>>remote CPU has to see the "message" or not.  It is still "message passing".  The
>>VIA architecture used by our cLAN stuff supports that type of shared memory, but
>>the two cards still send messages over the network router.  Shared memory
>>systems don't send "messages".
>>
>>
>>
>>>
>>>Note all those highends are in serious financial trouble now.
>>>Probably several bankrupt soon. SGI right now can keep alive a bit thanks to
>>>intel giving them itanium2 cpu's for near free. Cray already has major problems.
>>>They are pretty expensive anyway. 50k dollar for 1 node is what i call
>>>expensive. 1 node has 12 cpu's (opterons).
>>>
>>>Just like about everything above 4 cpu's will die. (Highend) network cards take
>>>over.
>>>
>>>Programmers simply are cheaper than hardware, they will have to adapt to
>>>networks.
>>>
>>>Vincent
>>
>>That last statement is so far beyond false it takes sunlight six years from the
>>time it reaches "false" until it reaches that statement.  _any_ good CS book
>>today _always_ contains the quote "In the 60's, the cost of developing software
>>was _far_ exceeded by the cost of the hardware it was run on, so efficiency of
>>the programming itself was paramount.  Today the cost of developing the software
>>far exceeds the cost of the system it runs on, so controling the development
>>cost is what software engineering is all about."
>>
>>
>>That was not just a wrong statement, it was a _grossly_ wrong statement.
>>
>>Pick up any good software engineering / software development text book, and
>>learn something.
>>
>>
>>Who out there besides Vincent thinks hardware costs exceed software development
>>costs in today's computing world???
>
>
>Indeed.  The biggest line item in a corporation's cost structure is the payroll.
> Programmers are the first thing to go in cost reduction efforts, because the
>numbers are so big.


It was a completely idiot statement that 99.99999% of the world knows is wrong.

Too many different CS books can be found to show it is wrong...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.