Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Correction hydra hardware

Author: Matthew Hull

Date: 10:20:58 02/03/05

Go up one level in this thread


On February 03, 2005 at 11:59:11, Robert Hyatt wrote:

>On February 03, 2005 at 11:17:31, Vincent Diepeveen wrote:
>
>>On February 02, 2005 at 11:54:46, Robert Hyatt wrote:
>>
>>>On February 02, 2005 at 09:53:03, Vincent Diepeveen wrote:
>>>
>>>>On February 01, 2005 at 21:51:19, Robert Hyatt wrote:
>>>>
>>>>>On February 01, 2005 at 17:19:26, Vincent Diepeveen wrote:
>>>>>
>>>>>>On February 01, 2005 at 16:28:22, Robert Hyatt wrote:
>>>>>>
>>>>>>Still didn't read the subject title?
>>>>>>
>>>>>>[snip]
>>>>>>
>>>>>>>Because a cluster can't offer 1/100th the total memory bandwidth of a big Cray
>>>>>>>vector box.
>>>>>>
>>>>>>Actually todays clusters deliver a factor 1000 more or so.
>>>>>>
>>>>>>Total bandwidth a cluster can deliver is measured nowadays in Terabytes per
>>>>>>second, with Cray it was measured in gigabytes per second.
>>>>>
>>>>>Let's see.  The last Cray I ran on with a chess program was a T932.  Processor
>>>>>could read 4 words and write two words per cycle, cycle time was 2ns.  So 6
>>>>>words, 48 bytes per cycle, x 500M cycles per second is about 2.5 gigabytes per
>>>>>second, x 32 processors is getting dangerously close to 100 gigabytes per
>>>>
>>>>Bandwidth a cpu at the old MIPS was 3.2 GB/s from memory (origin3000 series)
>>>>and bandwidth at altix3000 using network4 a cpu is 8.2 gigabyte per second from
>>>>memory.
>>>>
>>>>So what Cray streamed there was impressive for its days, but it delivered to
>>>>just a few cpu's, that was the entire main problem. This for massive power
>>>>consumption.
>>>>
>>>>What we speak of now is that you get effectively the same bandwidth from memory
>>>>to each cpu now, but systems go up to 130000+ processors.
>>>>
>>>>>second.  A "cluster" can have more theoretical bandwidth, but rarely as much
>>>>>_real_ bandwidth.  This is on a shared memory machine that can do real codes
>>>>>quite well.
>>>>
>>>>>>
>>>>>>Note it's the same network that gets used for huge Cray T3E's, but a newer and
>>>>>>bigger version, that's all.
>>>>>
>>>>>T3E isn't a vector computer.
>>>>
>>>>The processor used (alpha) was out of order, yet achieves the same main
>>>>objective, that's executing more than 1 instruction a cycle effectively.
>>>>
>>>>Itanium2 is objectively seen is a vector processor as it executes 2 bundles at
>>>>once. Though they call that IPF nowadays.
>>>
>>>That's not a vector architecture.  A vector machine executes _one_ instruction
>>>and produces a large number of results.  For example, current cray vector boxes
>>>can produce 128 results by executing a single instruction.  That is why MIPS was
>>>dropped and FLOPS became the measure for high-performance computing.
>>
>>IPF executes 2 bundles per cycle.
>>
>>1 bundle = 3 instructions in IPF
>>
>>You can see that as a vector.
>
>Maybe _you_ can see that as a vector.  No person familiar with computer
>architecture sees that as a vector.  No architecture textbook calls that a
>vector.  I stick to common definitions of words, not your privately twisted
>definitions that nobody can communicate with.
>
>Pick up a copy of Hennessy/Patterson's architecture book and look up "vector
>operations" in the index.  VLIW is _not_ vector.  "bundles" are _not_ vector.
>
>>
>>>
>>>>
>>>>All x86-64 which are taking over now are doing 3 instructions a cycle now and
>>>>deliver up to 2 flops a cycle.
>>>>
>>>>>>Crays had usually when in vector like what was is 4 cpu's or so? Sometimes up to
>>>>>>128. Above that it was T3E which had alpha's.
>>>>>>
>>>>>>that one used quadrics usually :)
>>>>>>
>>>>>>However look to France now. New great supercomputer. 8192 processors or so.
>>>>>>Say 2048 nodes. You're looking at 3.6 TB per second bandwidth :)
>>>>
>>>>>For a synthetic benchmark, not a real code, that's the problem with clusters so
>>>>>far...
>>>>
>>>>It's the speed the memory delivers to the cpu's.
>>>>
>>>>Nothing synthetic.
>>>
>>>Didn't think you would understand that.  It is about "theoretical peak" vs
>>>"sustained peak for real applications". The numbers are not the same.
>>
>>I know you will act innocent here. But you just have no idea of course.
>>
>>A 'cheapo' highend network card can get 800Mb/s.
>
>Not if two machines try to talk to the same node it can't.  Not if there is
>congestion in the router it can't.  Not if there are multiple router hops
>between the two points it can't.  Etc...
>
>>
>>>>
>>>>>>Those Crays you remember were 100Mhz ones. Network could deliver of course
>>>>>>exactly what cpu could calculate.
>>>>>
>>>>>There was no "network" and the crays were 500mhz although on a fully pipelined
>>>>>vector machine that can do 5-10 operations per cycle that is not exactly a good
>>>>>measure of performance.
>>>>
>>>>Operations doesn't count. Flops do.
>>>>
>>>>There is actually 1Ghz Cray here with 256KB cache.
>>>
>>>DOn't know what cray you have there, but the cache is not normal cache.  It is
>>>only for "scalar" operations.  Vector operations don't go through the scalar
>>>cache on a Cray, because all memory reads and writes are pipelined and deliver
>>>values every clock cycle after the latency delay for the first word.
>>
>>www.cray.com
>
>What should I look for?  I have the manuals for every vector machine they have
>ever produced/shipped in my office.
>
>
>>
>>>>
>>>>>>Not so great if you look to the total number of Gflop it delivered. Nowadays the
>>>>>>big clusters, as all big supercomputers nowadays are clusters, are measured in
>>>>>>Tflop and one already in Pflop :)
>>>>>>
>>>>>>There is a 0.36 Pflop one now under construction :)
>>>>>>
>>>>>>Vincent
>>>>
>>>>>Different computer for different applications.  Ask a real programmer which he
>>>>>would rather write code for...
>>>>
>>>>They'll all pick the fastest machine, which is the one delivering most flops.
>>>>
>>>>Vincent
>>>
>>>No.  Otherwise there would be no machines like the Cray, Fujitsu, Hitachi, etc
>>>left.  Using a large number of processors in a message-passing architecture is
>>>not as easy as using a small number of faster processors in a shared-memory
>>>architecture, for many applications.
>>
>>You confuse network cards with network cards. There is also network cards that
>>allow DSM. See www.quadrics.com for details. They have RAM on chip. No need for
>>message passing. You can read straight over the network remote from that RAM
>>without needing a message in the remote cpu to be handled.
>
>Please get into the world of standard definitions.  Does not matter whether the
>remote CPU has to see the "message" or not.  It is still "message passing".  The
>VIA architecture used by our cLAN stuff supports that type of shared memory, but
>the two cards still send messages over the network router.  Shared memory
>systems don't send "messages".
>
>
>
>>
>>Note all those highends are in serious financial trouble now.
>>Probably several bankrupt soon. SGI right now can keep alive a bit thanks to
>>intel giving them itanium2 cpu's for near free. Cray already has major problems.
>>They are pretty expensive anyway. 50k dollar for 1 node is what i call
>>expensive. 1 node has 12 cpu's (opterons).
>>
>>Just like about everything above 4 cpu's will die. (Highend) network cards take
>>over.
>>
>>Programmers simply are cheaper than hardware, they will have to adapt to
>>networks.
>>
>>Vincent
>
>That last statement is so far beyond false it takes sunlight six years from the
>time it reaches "false" until it reaches that statement.  _any_ good CS book
>today _always_ contains the quote "In the 60's, the cost of developing software
>was _far_ exceeded by the cost of the hardware it was run on, so efficiency of
>the programming itself was paramount.  Today the cost of developing the software
>far exceeds the cost of the system it runs on, so controling the development
>cost is what software engineering is all about."
>
>
>That was not just a wrong statement, it was a _grossly_ wrong statement.
>
>Pick up any good software engineering / software development text book, and
>learn something.
>
>
>Who out there besides Vincent thinks hardware costs exceed software development
>costs in today's computing world???


Indeed.  The biggest line item in a corporation's cost structure is the payroll.
 Programmers are the first thing to go in cost reduction efforts, because the
numbers are so big.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.