Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Correction hydra hardware

Author: Vincent Diepeveen
Date: 15:57:19 02/03/05
On February 03, 2005 at 11:59:11, Robert Hyatt wrote:

>On February 03, 2005 at 11:17:31, Vincent Diepeveen wrote:
>
>>On February 02, 2005 at 11:54:46, Robert Hyatt wrote:
>>
>>>On February 02, 2005 at 09:53:03, Vincent Diepeveen wrote:
>>>
>>>>On February 01, 2005 at 21:51:19, Robert Hyatt wrote:
>>>>
>>>>>On February 01, 2005 at 17:19:26, Vincent Diepeveen wrote:
>>>>>
>>>>>>On February 01, 2005 at 16:28:22, Robert Hyatt wrote:
>>>>>>
>>>>>>Still didn't read the subject title?
>>>>>>
>>>>>>[snip]
>>>>>>
>>>>>>>Because a cluster can't offer 1/100th the total memory bandwidth of a big Cray
>>>>>>>vector box.
>>>>>>
>>>>>>Actually todays clusters deliver a factor 1000 more or so.
>>>>>>
>>>>>>Total bandwidth a cluster can deliver is measured nowadays in Terabytes per
>>>>>>second, with Cray it was measured in gigabytes per second.
>>>>>
>>>>>Let's see.  The last Cray I ran on with a chess program was a T932.  Processor
>>>>>could read 4 words and write two words per cycle, cycle time was 2ns.  So 6
>>>>>words, 48 bytes per cycle, x 500M cycles per second is about 2.5 gigabytes per
>>>>>second, x 32 processors is getting dangerously close to 100 gigabytes per
>>>>
>>>>Bandwidth a cpu at the old MIPS was 3.2 GB/s from memory (origin3000 series)
>>>>and bandwidth at altix3000 using network4 a cpu is 8.2 gigabyte per second from
>>>>memory.
>>>>
>>>>So what Cray streamed there was impressive for its days, but it delivered to
>>>>just a few cpu's, that was the entire main problem. This for massive power
>>>>consumption.
>>>>
>>>>What we speak of now is that you get effectively the same bandwidth from memory
>>>>to each cpu now, but systems go up to 130000+ processors.
>>>>
>>>>>second.  A "cluster" can have more theoretical bandwidth, but rarely as much
>>>>>_real_ bandwidth.  This is on a shared memory machine that can do real codes
>>>>>quite well.
>>>>
>>>>>>
>>>>>>Note it's the same network that gets used for huge Cray T3E's, but a newer and
>>>>>>bigger version, that's all.
>>>>>
>>>>>T3E isn't a vector computer.
>>>>
>>>>The processor used (alpha) was out of order, yet achieves the same main
>>>>objective, that's executing more than 1 instruction a cycle effectively.
>>>>
>>>>Itanium2 is objectively seen is a vector processor as it executes 2 bundles at
>>>>once. Though they call that IPF nowadays.
>>>
>>>That's not a vector architecture.  A vector machine executes _one_ instruction
>>>and produces a large number of results.  For example, current cray vector boxes
>>>can produce 128 results by executing a single instruction.  That is why MIPS was
>>>dropped and FLOPS became the measure for high-performance computing.
>>
>>IPF executes 2 bundles per cycle.
>>
>>1 bundle = 3 instructions in IPF
>>
>>You can see that as a vector.
>
>Maybe _you_ can see that as a vector.  No person familiar with computer
>architecture sees that as a vector.  No architecture textbook calls that a
>vector.  I stick to common definitions of words, not your privately twisted
>definitions that nobody can communicate with.
>Pick up a copy of Hennessy/Patterson's architecture book and look up "vector
>operations" in the index.  VLIW is _not_ vector.  "bundles" are _not_ vector.

A vector by definition of a textbook is not holy. It's about what benefits it
gives to the programmer. They just care how many instructions a cycle you can
execute times the Mhz of a processor. And there Cray loses bigtime to the cheapo
processors of today.

>>>
>>>>
>>>>All x86-64 which are taking over now are doing 3 instructions a cycle now and
>>>>deliver up to 2 flops a cycle.
>>>>
>>>>>>Crays had usually when in vector like what was is 4 cpu's or so? Sometimes up to
>>>>>>128. Above that it was T3E which had alpha's.
>>>>>>
>>>>>>that one used quadrics usually :)
>>>>>>
>>>>>>However look to France now. New great supercomputer. 8192 processors or so.
>>>>>>Say 2048 nodes. You're looking at 3.6 TB per second bandwidth :)
>>>>
>>>>>For a synthetic benchmark, not a real code, that's the problem with clusters so
>>>>>far...
>>>>
>>>>It's the speed the memory delivers to the cpu's.
>>>>
>>>>Nothing synthetic.
>>>
>>>Didn't think you would understand that.  It is about "theoretical peak" vs
>>>"sustained peak for real applications". The numbers are not the same.
>>
>>I know you will act innocent here. But you just have no idea of course.
>>
>>A 'cheapo' highend network card can get 800Mb/s.
>
>Not if two machines try to talk to the same node it can't.  Not if there is
>congestion in the router it can't.  Not if there are multiple router hops
>between the two points it can't.  Etc...

Well now you're running in circles. Obviously a programmer must be good to get
the maximum out of a machine. No discussion possible.

>>
>>>>
>>>>>>Those Crays you remember were 100Mhz ones. Network could deliver of course
>>>>>>exactly what cpu could calculate.
>>>>>
>>>>>There was no "network" and the crays were 500mhz although on a fully pipelined
>>>>>vector machine that can do 5-10 operations per cycle that is not exactly a good
>>>>>measure of performance.
>>>>
>>>>Operations doesn't count. Flops do.
>>>>
>>>>There is actually 1Ghz Cray here with 256KB cache.
>>>
>>>DOn't know what cray you have there, but the cache is not normal cache.  It is
>>>only for "scalar" operations.  Vector operations don't go through the scalar
>>>cache on a Cray, because all memory reads and writes are pipelined and deliver
>>>values every clock cycle after the latency delay for the first word.
>>
>>www.cray.com
>
>What should I look for?  I have the manuals for every vector machine they have
>ever produced/shipped in my office.
>

That will make good reading for you when you retire.

>>>>>>Not so great if you look to the total number of Gflop it delivered. Nowadays the
>>>>>>big clusters, as all big supercomputers nowadays are clusters, are measured in
>>>>>>Tflop and one already in Pflop :)
>>>>>>
>>>>>>There is a 0.36 Pflop one now under construction :)
>>>>>>
>>>>>>Vincent
>>>>
>>>>>Different computer for different applications.  Ask a real programmer which he
>>>>>would rather write code for...
>>>>
>>>>They'll all pick the fastest machine, which is the one delivering most flops.
>>>>
>>>>Vincent
>>>
>>>No.  Otherwise there would be no machines like the Cray, Fujitsu, Hitachi, etc
>>>left.  Using a large number of processors in a message-passing architecture is
>>>not as easy as using a small number of faster processors in a shared-memory
>>>architecture, for many applications.
>>
>>You confuse network cards with network cards. There is also network cards that
>>allow DSM. See www.quadrics.com for details. They have RAM on chip. No need for
>>message passing. You can read straight over the network remote from that RAM
>>without needing a message in the remote cpu to be handled.
>
>Please get into the world of standard definitions.  Does not matter whether the
>remote CPU has to see the "message" or not.  It is still "message passing".  The
>VIA architecture used by our cLAN stuff supports that type of shared memory, but

cLAN is not so optimal for what we do in computerchess. Get DSM network cards.

Get either Dolphin or Quadrics.

>the two cards still send messages over the network router.  Shared memory
>systems don't send "messages".

You can't avoid shipping 'data' over the network routers and switches. Remote
memory lookups i also don't classify like messages but like remote memory
lookups. However they turn into messages in my viewpoint at the moment that the
network card does either read outside the memory that's on the card, or when
there is no memory on the card.

Myrinet cards have just a few tens of kilobytes of buffer which obviously is too
little. Idemdito cLAN type cards like Emulex.

Quadrics cards when they are for example in Cray T3E's or itanium2
supercomputers. They just get remote data from the memory in the quadrics card
which of course has its own +- 300Mhz processor to serve memory from the card
itself in case of using the shmem library.

That's 100% like you did at Cray.

It's pretty much the same library, now working for linux.

>
>
>>
>>Note all those highends are in serious financial trouble now.
>>Probably several bankrupt soon. SGI right now can keep alive a bit thanks to
>>intel giving them itanium2 cpu's for near free. Cray already has major problems.
>>They are pretty expensive anyway. 50k dollar for 1 node is what i call
>>expensive. 1 node has 12 cpu's (opterons).
>>
>>Just like about everything above 4 cpu's will die. (Highend) network cards take
>>over.
>>
>>Programmers simply are cheaper than hardware, they will have to adapt to
>>networks.
>>
>>Vincent
>
>That last statement is so far beyond false it takes sunlight six years from the
>time it reaches "false" until it reaches that statement.  _any_ good CS book
>today _always_ contains the quote "In the 60's, the cost of developing software
>was _far_ exceeded by the cost of the hardware it was run on, so efficiency of
>the programming itself was paramount.  Today the cost of developing the software
>far exceeds the cost of the system it runs on, so controling the development
>cost is what software engineering is all about."
>That was not just a wrong statement, it was a _grossly_ wrong statement.
>Pick up any good software engineering / software development text book, and
>learn something.

In supercomputing the statement holds true obviously.

A shared memory 16 processor alpha cost around the year 2000 the sum of 10
million dollar. Idemdito a 16 processor Sun cost around 10 million dollar.

However you can get a x-thousand processor cluster with quadrics or dolphin for
that amount of money now.

So obviously the price of the programmer is cheaper than the alternative to
networked pc's.

In highend the books are obviously wrong.

For 100k euro you can hire 20 programmers for a year in India.

Do you know what the price is of a 16 processor license of Oracle?

>Who out there besides Vincent thinks hardware costs exceed software development
>costs in today's computing world???

HIGHEND computing world.

We're talking about highend supercomputers here. Not your dual Xeon.

Vincent
Re: Correction hydra hardware Robert Hyatt 20:08:28 02/03/05
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.