Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Latency versus Information Bandwidth: Questions

Author: Vincent Diepeveen

Date: 04:32:57 12/06/02

Go up one level in this thread


On December 05, 2002 at 01:14:18, Jeremiah Penery wrote:

>On December 04, 2002 at 23:23:32, Robert Hyatt wrote:
>
>>>Current AthlonMP chipsets also have a seperate bus per CPU.  They use the same
>>>EV6 bus as Alpha processors did (or still do?).  The memory modules shared,
>>>whereas Hammer will have separate memory modules for each processor.
>>
>>
>>The problem with that is it turns into a NUMA architecture which has its _own_
>>set of problems.  One cpu connected to one memory module means that the other
>>CPU can't get to it as efficiently...
>>
>>IE this doesn't offer one tiny bit of improvement over a SMP-type machine with
>>shared memory...  Unless the algorithm is specifically designed to attempt to
>>lccalize memory references and duplicate data that is needed by both threads
>>often...
>>
>>This might be an improvement for running two programs at once.  For one
>>program using two processors, NUMA offers additional challenges for the
>>parallel programmer...
>
>According to all documentation, which I have no reason to doubt, a non-local
>memory access in a Hammer system is just as fast as a memory access in a
>processor/chipset combination where the memory controller resides in the
>northbridge (i.e. all other x86 configurations).  Local memory accesses are
>quite a lot faster.  Therefore, the average case, even in 8-way machines that
>take up to 3 hops for a memory access, is still below that of any x86 machine of
>today.

If you read the documentation as it is you get confronted with
theoretical data which doesn't take into account any part of
the configuration which is worst case.

Bob is more near the truth here than you might want to guess, because
as soon as you go run on those supercomputers with theoretic performance
of a certain peak and you go test yourself then the practical peak
is up to 50 times slower than the theoretic data suggests.

So on paper this is way faster and even works up to 8 cpu's (which is
unlikely we ever will see working), as good propagandists those papers
are not going to tell you weak spots in the design which prevent
that *theoretic* performance from happening in reality.

In case they get this dual CPU to work we will see what its speed is.

For now i assume it's a cluster like Bob does.

Note that it's nearly impossible to get to work a 8 cpu machine with
that architecture. Imagine how complex design of it will be.

Which OS will work on that?

Best regards,
Vincent








This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.