Author: Robert Hyatt
Date: 20:35:57 08/25/02
Go up one level in this thread
On August 25, 2002 at 22:08:26, Jeremiah Penery wrote: >On August 25, 2002 at 21:50:48, Robert Hyatt wrote: > >>On August 25, 2002 at 11:21:31, Dan Andersson wrote: >> >>>> >>>>If you look at my response to Vincent, there are a few issues that need >>>>addressing because NUMA introduces a much higher memory latency for >>>>processors that are farther away from the actual memory word needed. This >>>>has to be addressed to produce reasonable speedups. >>>> >>>You say much higher. The latency for one CPU is 80 ns. For MP one hop is 115 ns, >> >> >>First, I don't believe _any_ of those numbers from vendors. If they were >>true, the vendors would use that _same_ low-latency memory on the uniprocessor >>boxes. But they don't. Which says a lot about the "realness" of the numbers. > >Perhaps you have not read much about the Hammer architecture. The thing that so >greatly reduces latency is that it has a memory controller on the processor die, >which scales linearly in speed with the processor. The memory itself is the >same as on any other box. > That was my point. The latency is not a controller issue. It is an issue about how quickly you can dump a set of capacitors, measure the voltage, and put it back. I doubt hammer has solved that problem. Because Cray certainly did not with their machines... >In all current processor/chipset configurations, the CPU has to send a memory >request to the Northbridge of the motherboard, which runs at a low clockspeed. >The northbridge has to send the request on to the main memory, which sends it >back through the same channel. Hammer eliminates the northbridge setup >completely - memory requests go directly from the processor to the memory banks, >via a high-speed HyperTransport tunnel. That's ok... but it doesn't solve the 100ns delay to dump capacitors... > >With multiple CPUs, an access goes through HyperTransport to whatever CPU is >directly connected to the needed memory first, then proceeds the same way. Even >with this extra step, it is AT LEAST as fast as current CPU-Northbridge-Memory >setups (it is the same number of steps as that configuration then), because >HyperTransport in general has lower latency than most (all?) current >communication protocols. Now you get to _the_ issue. For streaming memory requests, the above sounds good. But for random reads/writes, the latency is not in the controller or bus, it is in the memory chips themselves... For chess I don't care about streaming memory references. That is something I would be interested in for a large vector-type application, and that is what Cray has been doing so well for years. But a 60 million dollar Cray _still_ can't overcome that random access latency. Neither will Hammer... > ><Large snip> > >>Hopefully we will see some real numbers soon... But a memory controller on >>chip speaks to problems with more than one chip... > >I eagerly await real numbers also. It's possible that the quoted numbers are >lower than any real-world figure we may see, but I suspect that memory latency >for Hammer systems will be considerably lower than any current setup, at the >very least. I suspect they will be _identical_ for random accesses. Which is the kind of accesses we mainly do in CC. > >As for problems with more than one chip, it doesn't look to cause any kind of >problems due to the way it's being handled with multiple HyperTransport tunnels. > However, like anything else, we can only wait and see what real figures look >like. Two controllers == conflicts for the bus. More than two controllers == more conflicts... That has always been a problem. One potential solution is multi-ported memory. That has its own problems, however, as now you move the conflicts into the memory bank itself...
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.