Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Latency versus Information Bandwidth: Questions

Author: Matt Taylor

Date: 16:16:47 12/05/02

Go up one level in this thread


On December 05, 2002 at 18:10:40, Robert Hyatt wrote:

>On December 05, 2002 at 14:52:31, Matt Taylor wrote:
>
>>On December 05, 2002 at 10:28:29, Robert Hyatt wrote:
>>
>>>On December 05, 2002 at 01:14:18, Jeremiah Penery wrote:
>>>
>>>>On December 04, 2002 at 23:23:32, Robert Hyatt wrote:
>>>>
>>>>>>Current AthlonMP chipsets also have a seperate bus per CPU.  They use the same
>>>>>>EV6 bus as Alpha processors did (or still do?).  The memory modules shared,
>>>>>>whereas Hammer will have separate memory modules for each processor.
>>>>>
>>>>>
>>>>>The problem with that is it turns into a NUMA architecture which has its _own_
>>>>>set of problems.  One cpu connected to one memory module means that the other
>>>>>CPU can't get to it as efficiently...
>>>>>
>>>>>IE this doesn't offer one tiny bit of improvement over a SMP-type machine with
>>>>>shared memory...  Unless the algorithm is specifically designed to attempt to
>>>>>lccalize memory references and duplicate data that is needed by both threads
>>>>>often...
>>>>>
>>>>>This might be an improvement for running two programs at once.  For one
>>>>>program using two processors, NUMA offers additional challenges for the
>>>>>parallel programmer...
>>>>
>>>>According to all documentation, which I have no reason to doubt, a non-local
>>>>memory access in a Hammer system is just as fast as a memory access in a
>>>>processor/chipset combination where the memory controller resides in the
>>>>northbridge (i.e. all other x86 configurations).  Local memory accesses are
>>>>quite a lot faster.  Therefore, the average case, even in 8-way machines that
>>>>take up to 3 hops for a memory access, is still below that of any x86 machine of
>>>>today.
>>>
>>>
>>>When I see one of those deliver that performance, I'll be a believer.  Even Cray
>>>couldn't
>>>make that happen without going to a many-ported memory which is ridiculously
>>>expensive compared to a PC.
>>
>>They'll deliver performance because they're -NOT- a NUMA architecture. It's a
>>crossbar scheme. Each CPU is equipped with a crossbar.
>>
>>-Matt
>
>That's fine, but that is not the same thing as saying "each processor has a
>dedicated bus to its
>own memory".

Each Opteron chip has a crossbar. It's not quite the same, but in the same sense
that a node in NUMA system has a dedicated memory bus, so does Opteron.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.