Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Who can update about new 64 bits chip?

Author: Eugene Nalimov

Date: 19:42:31 08/26/02

Go up one level in this thread


I read that separate memory controller is responsible for ~1/3 of the typical
load-to-use latency in the modern CPUs. That is better than 20% you mentioned,
but not dramatically better. OTOH that means that you can speed memory accesses
in a single-CPU systems by 1/3 without inventing new memory type. You only have
to use small part of the countless transisors you have in the CPU.

Go to http://www.compaq.com/alphaserver/whitepapers.html and take a look at the
EV7 whitepaper. EV7 (known also as Alpha 21364) is a CPU with integrated memory
contoller. They achieved 75ns load-to-use latency on the RDRAM(!) memory.

Look at the figure 5 on page 9. System is NUMA, but it looks that for the up to
24 CPUs you can build crossbar with reasonable latency (~200ns), that will be
not [much] worse than latency on the modern 8 CPUs systems at a fraction of cost
of the supporting chipsets and bridges.

Thanks,
Eugene

On August 26, 2002 at 18:08:47, Robert Hyatt wrote:

>On August 26, 2002 at 16:35:10, Bo Persson wrote:
>
>>On August 26, 2002 at 11:07:25, Robert Hyatt wrote:
>>
>>>On August 26, 2002 at 05:13:35, Vincent Lejeune wrote:
>>>
>>>>
>>>>Waiting for the real numbers ...
>>>
>>>
>>>Read that again, carefully.  "local memory". This is NUMA.  The penalty for
>>>accessing memory that is _not_ local is significant.  The penalty for accessing
>>>local memory is still 100ns or so, because nobody knows how to reduce
>>>resistance, capacitance and inductance together.
>>>
>>>When you have multiple processors there will be significant conflicts.  I don't
>>>know whether that "hypertransport bus" if full-duplex or not.  If it is, it
>>>might work OK for two processors, but not beyond two as there would be no easy
>>>way to manage more than two.
>>
>>Theoretically they could. The more-than-2-way Hammers, the Opteron, have 4 sets
>>of the hypertransport logic. Would work fine for quad boxes. The local memory
>>channel is also separate. They have a *lot* of pins...
>
>OK... If they do 4 channels.  This sounds like a transputer approach of
>course, where beyond 4 you run into the same problem as always if you only
>have four connections to play with...  Then you can try hyper-cube type
>approaches to use 16 nodes and 4 connections I suppose..  with more latency.
>
>>
>>> I assume it is a "normal bus" which means if
>>>the two processors want to access each other's local memory, one is definitely
>>>going to wait.  And that also means there is some sort of bus negotiation
>>>protocol which extends latency as well...
>>
>>Probably!
>>
>>
>>Bo Persson
>>bop2@telia.com



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.