Author: Eugene Nalimov
Date: 19:42:31 08/26/02
Go up one level in this thread
I read that separate memory controller is responsible for ~1/3 of the typical load-to-use latency in the modern CPUs. That is better than 20% you mentioned, but not dramatically better. OTOH that means that you can speed memory accesses in a single-CPU systems by 1/3 without inventing new memory type. You only have to use small part of the countless transisors you have in the CPU. Go to http://www.compaq.com/alphaserver/whitepapers.html and take a look at the EV7 whitepaper. EV7 (known also as Alpha 21364) is a CPU with integrated memory contoller. They achieved 75ns load-to-use latency on the RDRAM(!) memory. Look at the figure 5 on page 9. System is NUMA, but it looks that for the up to 24 CPUs you can build crossbar with reasonable latency (~200ns), that will be not [much] worse than latency on the modern 8 CPUs systems at a fraction of cost of the supporting chipsets and bridges. Thanks, Eugene On August 26, 2002 at 18:08:47, Robert Hyatt wrote: >On August 26, 2002 at 16:35:10, Bo Persson wrote: > >>On August 26, 2002 at 11:07:25, Robert Hyatt wrote: >> >>>On August 26, 2002 at 05:13:35, Vincent Lejeune wrote: >>> >>>> >>>>Waiting for the real numbers ... >>> >>> >>>Read that again, carefully. "local memory". This is NUMA. The penalty for >>>accessing memory that is _not_ local is significant. The penalty for accessing >>>local memory is still 100ns or so, because nobody knows how to reduce >>>resistance, capacitance and inductance together. >>> >>>When you have multiple processors there will be significant conflicts. I don't >>>know whether that "hypertransport bus" if full-duplex or not. If it is, it >>>might work OK for two processors, but not beyond two as there would be no easy >>>way to manage more than two. >> >>Theoretically they could. The more-than-2-way Hammers, the Opteron, have 4 sets >>of the hypertransport logic. Would work fine for quad boxes. The local memory >>channel is also separate. They have a *lot* of pins... > >OK... If they do 4 channels. This sounds like a transputer approach of >course, where beyond 4 you run into the same problem as always if you only >have four connections to play with... Then you can try hyper-cube type >approaches to use 16 nodes and 4 connections I suppose.. with more latency. > >> >>> I assume it is a "normal bus" which means if >>>the two processors want to access each other's local memory, one is definitely >>>going to wait. And that also means there is some sort of bus negotiation >>>protocol which extends latency as well... >> >>Probably! >> >> >>Bo Persson >>bop2@telia.com
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.