Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Correction hydra hardware

Author: Vincent Diepeveen

Date: 06:53:03 02/02/05

Go up one level in this thread


On February 01, 2005 at 21:51:19, Robert Hyatt wrote:

>On February 01, 2005 at 17:19:26, Vincent Diepeveen wrote:
>
>>On February 01, 2005 at 16:28:22, Robert Hyatt wrote:
>>
>>Still didn't read the subject title?
>>
>>[snip]
>>
>>>Because a cluster can't offer 1/100th the total memory bandwidth of a big Cray
>>>vector box.
>>
>>Actually todays clusters deliver a factor 1000 more or so.
>>
>>Total bandwidth a cluster can deliver is measured nowadays in Terabytes per
>>second, with Cray it was measured in gigabytes per second.
>
>Let's see.  The last Cray I ran on with a chess program was a T932.  Processor
>could read 4 words and write two words per cycle, cycle time was 2ns.  So 6
>words, 48 bytes per cycle, x 500M cycles per second is about 2.5 gigabytes per
>second, x 32 processors is getting dangerously close to 100 gigabytes per

Bandwidth a cpu at the old MIPS was 3.2 GB/s from memory (origin3000 series)
and bandwidth at altix3000 using network4 a cpu is 8.2 gigabyte per second from
memory.

So what Cray streamed there was impressive for its days, but it delivered to
just a few cpu's, that was the entire main problem. This for massive power
consumption.

What we speak of now is that you get effectively the same bandwidth from memory
to each cpu now, but systems go up to 130000+ processors.

>second.  A "cluster" can have more theoretical bandwidth, but rarely as much
>_real_ bandwidth.  This is on a shared memory machine that can do real codes
>quite well.

>>
>>Note it's the same network that gets used for huge Cray T3E's, but a newer and
>>bigger version, that's all.
>
>T3E isn't a vector computer.

The processor used (alpha) was out of order, yet achieves the same main
objective, that's executing more than 1 instruction a cycle effectively.

Itanium2 is objectively seen is a vector processor as it executes 2 bundles at
once. Though they call that IPF nowadays.

All x86-64 which are taking over now are doing 3 instructions a cycle now and
deliver up to 2 flops a cycle.

>>Crays had usually when in vector like what was is 4 cpu's or so? Sometimes up to
>>128. Above that it was T3E which had alpha's.
>>
>>that one used quadrics usually :)
>>
>>However look to France now. New great supercomputer. 8192 processors or so.
>>Say 2048 nodes. You're looking at 3.6 TB per second bandwidth :)

>For a synthetic benchmark, not a real code, that's the problem with clusters so
>far...

It's the speed the memory delivers to the cpu's.

Nothing synthetic.

>>Those Crays you remember were 100Mhz ones. Network could deliver of course
>>exactly what cpu could calculate.
>
>There was no "network" and the crays were 500mhz although on a fully pipelined
>vector machine that can do 5-10 operations per cycle that is not exactly a good
>measure of performance.

Operations doesn't count. Flops do.

There is actually 1Ghz Cray here with 256KB cache.

>>Not so great if you look to the total number of Gflop it delivered. Nowadays the
>>big clusters, as all big supercomputers nowadays are clusters, are measured in
>>Tflop and one already in Pflop :)
>>
>>There is a 0.36 Pflop one now under construction :)
>>
>>Vincent

>Different computer for different applications.  Ask a real programmer which he
>would rather write code for...

They'll all pick the fastest machine, which is the one delivering most flops.

Vincent



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.