Author: Jeremiah Penery
Date: 18:34:04 07/10/03
Go up one level in this thread
On July 10, 2003 at 15:59:42, Vincent Diepeveen wrote: >On July 09, 2003 at 19:21:52, Jeremiah Penery wrote: > >>On July 09, 2003 at 08:25:39, Vincent Diepeveen wrote: >> >>>Nevertheless this machine is record breaking and always will be remembered for >>>that. Assuming it is designed for big vectors it's quite a bit slower in latency >>>then because if you optimize for huge transfers at once then a single transfer >>>is probably very pricey. >>> >>>So let's ignore the latency question, it wasn't designed for it simply. >> >>Instead of just guessing, why don't you go look it up. Information is widely >>available. >> >>Here - http://www.sc.doe.gov/ascr/dongarra.pdf - the MPI_PUT latency is listed >>as 6.63us. Everywhere else I've seen lists under 10us, with most being much >>closer to 5us. > >When it goes to some central bottleneck then you never can avoid such huge >latencies. For a supercomputer that can do OpenMP till 2048 processors like the >Earth machine (if i interpret data in that pdf well) then at 500Mhz with 16 >instructions a clock (therefore called vector processor) which also can be No, it's called a vector processor because the vector unit uses 72 vector registers, each holding 256 64-bit values, with multiple sets of vector units (each with 6 instruction pipelines) designed to operate on them. Being called a 'vector processor' has absolutely nothing to do with the ability to do 16 instructions/clock. BTW, the vector parts of the chip operate at 1GHz, from what I can tell. The scalar part is 500MHz. >achieved actually at 8 gflops it is really a great machine for the matrix guys. >Most likely that 6.63 latency us is for huge lines of data as they achieve >12.xxGB a second with it. > >Note that MPI_PUT is a one way function. It isn't *waiting* for data to get >back. There are a *lot* of other PDF, PPT, and HTML documents that give slightly different figures. How about this one: Inter-node MPI communication - Latency 8.6us http://wwwbode.cs.tum.edu/~gerndt/home/Research/PADC2002/Talks/Kerbyson.pdf and at several other sites. I see "bi-directional MPI communication latency" listed at 8us here: http://www.lanl.gov/orgs/ccn/salishan2003/pdf/kerbyson.pdf Here: http://camelback-comparch.com/Scalable%20MicroSupercomputers%20Presentation.pdf I see MPI latency listed at 5.6us. >However if we consider circumstances and the design of the stuff out there that >really isn't interesting. Interesting is that they can get 12.xx GB bandwidth >with MPI_PUT. > >This stuff is not designed for chessprograms. No, but neither was the Pentium4. >So if we are busy with just getting random cache lines of say 128 bytes at most, >then the latency will be more around 20 us at this machine. That's not nice to >say however as it is not designed for this. You're making up numbers again. >It is designed to put 12.8GB through the central router with a MPI_PUT a second >and that is an incredible achievement for node to node. I'm not claiming that this thing has the lowest latency of anything in the world. I'm only saying that it is very low latency, relative to other very large machines. I don't think that a properly vectorized chess program would scale all that badly, even up to the maximum number of processors, because you can load a lot of memory into the vector registers and use longer loops to hide remote memory access. I'd guess 10% efficiency would be attainable. But of course, that is only a guess, and impossible to prove right or wrong.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.