Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Cray

Author: Jeremiah Penery

Date: 18:34:04 07/10/03

Go up one level in this thread


On July 10, 2003 at 15:59:42, Vincent Diepeveen wrote:

>On July 09, 2003 at 19:21:52, Jeremiah Penery wrote:
>
>>On July 09, 2003 at 08:25:39, Vincent Diepeveen wrote:
>>
>>>Nevertheless this machine is record breaking and always will be remembered for
>>>that. Assuming it is designed for big vectors it's quite a bit slower in latency
>>>then because if you optimize for huge transfers at once then a single transfer
>>>is probably very pricey.
>>>
>>>So let's ignore the latency question, it wasn't designed for it simply.
>>
>>Instead of just guessing, why don't you go look it up.  Information is widely
>>available.
>>
>>Here - http://www.sc.doe.gov/ascr/dongarra.pdf - the MPI_PUT latency is listed
>>as 6.63us.  Everywhere else I've seen lists under 10us, with most being much
>>closer to 5us.
>
>When it goes to some central bottleneck then you never can avoid such huge
>latencies. For a supercomputer that can do OpenMP till 2048 processors like the
>Earth machine (if i interpret data in that pdf well) then at 500Mhz with 16
>instructions a clock (therefore called vector processor) which also can be

No, it's called a vector processor because the vector unit uses 72 vector
registers, each holding 256 64-bit values, with multiple sets of vector units
(each with 6 instruction pipelines) designed to operate on them.  Being called a
'vector processor' has absolutely nothing to do with the ability to do 16
instructions/clock.

BTW, the vector parts of the chip operate at 1GHz, from what I can tell.  The
scalar part is 500MHz.

>achieved actually at 8 gflops it is really a great machine for the matrix guys.
>Most likely that 6.63 latency us is for huge lines of data as they achieve
>12.xxGB a second with it.
>
>Note that MPI_PUT is a one way function. It isn't *waiting* for data to get
>back.

There are a *lot* of other PDF, PPT, and HTML documents that give slightly
different figures.

How about this one:  Inter-node MPI communication - Latency  8.6us

http://wwwbode.cs.tum.edu/~gerndt/home/Research/PADC2002/Talks/Kerbyson.pdf and
at several other sites.

I see "bi-directional MPI communication latency" listed at 8us here:
http://www.lanl.gov/orgs/ccn/salishan2003/pdf/kerbyson.pdf

Here:
http://camelback-comparch.com/Scalable%20MicroSupercomputers%20Presentation.pdf
I see MPI latency listed at 5.6us.

>However if we consider circumstances and the design of the stuff out there that
>really isn't interesting. Interesting is that they can get 12.xx GB bandwidth
>with MPI_PUT.
>
>This stuff is not designed for chessprograms.

No, but neither was the Pentium4.

>So if we are busy with just getting random cache lines of say 128 bytes at most,
>then the latency will be more around 20 us at this machine. That's not nice to
>say however as it is not designed for this.

You're making up numbers again.

>It is designed to put 12.8GB through the central router with a MPI_PUT a second
>and that is an incredible achievement for node to node.

I'm not claiming that this thing has the lowest latency of anything in the
world.  I'm only saying that it is very low latency, relative to other very
large machines.  I don't think that a properly vectorized chess program would
scale all that badly, even up to the maximum number of processors, because you
can load a lot of memory into the vector registers and use longer loops to hide
remote memory access.  I'd guess 10% efficiency would be attainable.  But of
course, that is only a guess, and impossible to prove right or wrong.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.