Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Latency versus Information Bandwidth: Questions

Author: Robert Hyatt

Date: 09:03:09 12/06/02

Go up one level in this thread


On December 06, 2002 at 07:32:57, Vincent Diepeveen wrote:

>On December 05, 2002 at 01:14:18, Jeremiah Penery wrote:
>
>>On December 04, 2002 at 23:23:32, Robert Hyatt wrote:
>>
>>>>Current AthlonMP chipsets also have a seperate bus per CPU.  They use the same
>>>>EV6 bus as Alpha processors did (or still do?).  The memory modules shared,
>>>>whereas Hammer will have separate memory modules for each processor.
>>>
>>>
>>>The problem with that is it turns into a NUMA architecture which has its _own_
>>>set of problems.  One cpu connected to one memory module means that the other
>>>CPU can't get to it as efficiently...
>>>
>>>IE this doesn't offer one tiny bit of improvement over a SMP-type machine with
>>>shared memory...  Unless the algorithm is specifically designed to attempt to
>>>lccalize memory references and duplicate data that is needed by both threads
>>>often...
>>>
>>>This might be an improvement for running two programs at once.  For one
>>>program using two processors, NUMA offers additional challenges for the
>>>parallel programmer...
>>
>>According to all documentation, which I have no reason to doubt, a non-local
>>memory access in a Hammer system is just as fast as a memory access in a
>>processor/chipset combination where the memory controller resides in the
>>northbridge (i.e. all other x86 configurations).  Local memory accesses are
>>quite a lot faster.  Therefore, the average case, even in 8-way machines that
>>take up to 3 hops for a memory access, is still below that of any x86 machine of
>>today.
>
>If you read the documentation as it is you get confronted with
>theoretical data which doesn't take into account any part of
>the configuration which is worst case.
>
>Bob is more near the truth here than you might want to guess, because
>as soon as you go run on those supercomputers with theoretic performance
>of a certain peak and you go test yourself then the practical peak
>is up to 50 times slower than the theoretic data suggests.
>
>So on paper this is way faster and even works up to 8 cpu's (which is
>unlikely we ever will see working), as good propagandists those papers
>are not going to tell you weak spots in the design which prevent
>that *theoretic* performance from happening in reality.
>
>In case they get this dual CPU to work we will see what its speed is.
>
>For now i assume it's a cluster like Bob does.
>
>Note that it's nearly impossible to get to work a 8 cpu machine with
>that architecture. Imagine how complex design of it will be.
>
>Which OS will work on that?
>
>Best regards,
>Vincent


Linux probably, as the newer kernels now have a NUMA option within the config
stuff to provide drivers to make things at least begin to work.  Whether the
architecture
can deliver any real performance is another issue...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.