Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The need to unmake move

Author: Robert Hyatt

Date: 05:46:12 08/29/03

Go up one level in this thread


On August 28, 2003 at 19:12:52, Jeremiah Penery wrote:

>On August 28, 2003 at 11:39:51, Robert Hyatt wrote:
>
>>On August 28, 2003 at 01:00:22, Jeremiah Penery wrote:
>>
>>>On a 2-way Opteron, accessing non-local memory should be at least as fast as
>>>accessing memory on a single-cpu P4 or Athlon system.  For a 4-way Opteron, it
>>>still should not be worse, even if it requires 2 hops.
>>
>>Perhaps.  But don't forget, when you have two cpus, 1/2 the memory _is_ slower
>>than the other half.  By some fixed latency.  A poor algorithm will definitely
>>perform slower than a good one, because the good one won't fight that extra
>>latency while the poor one will hit it all the time.
>
>But it's not more latency than you get *best case* when using a traditional SMP
>setup.  So you can only gain, even with a "poor algorithm".

If you compare an SMP xeon to a dual 486 you _also_ "win".

But my point was that with a NUMA architecture, you might win a lot less
than you could, if the algorithm doesn't take into account the specific
architectural issues with a NUMA machine.


>
>>>Cache coherency is just as much a problem on SMP machines as on NUMA ones.
>>
>>no it isn't.  For the reasons NUMA memory access is more problematic than
>>pure SMP access.  The cache controllers have the _same_ latency issue.  A
>>cache controller "way over there" takes much longer to "snoop/invalidate"
>>than one "right next door."
>>
>>So you run into the _same_ issue again.   The "farther apart" two processors
>>are, the less stuff you want to share in memory, because the cache coherency
>>problem is slower to handle...
>
>I don't know that I understand what you're saying, but I also don't think you
>understand the Opteron NUMA setup very well.  In a 2(4) CPU Opteron setup, every
>CPU (and memory bank) is *closer* to (or not farther from, in the worst case)
>each other CPU (or memory) than in a traditional SMP setup with a northbridge.
>Opterons are connected *directly*, rather than through a traditional bus.
>Latency between processors is not always uniform, but it is still faster than
>the traditional setup.  I don't see why uniformity is an issue, because even
>memory accesses in a single-CPU setup are far from uniform in latency.  The
>point is that the latency is lower than normal, regardless of the fact that it
>is non-uniform.


My point was, again, that you want most references from a CPU to go to its
local memory for max performance.  It's an issue on _all_ NUMA-type machines.



This page took 0.05 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.