Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: The need to unmake move

Author: Robert Hyatt

Date: 09:08:03 09/03/03

On September 03, 2003 at 10:38:05, Vincent Diepeveen wrote:

>On September 01, 2003 at 23:53:20, Robert Hyatt wrote:
>
>>On August 29, 2003 at 19:56:34, Jeremiah Penery wrote:
>>
>>>On August 29, 2003 at 18:40:51, Eugene Nalimov wrote:
>>>
>>>>On August 29, 2003 at 18:32:46, Jeremiah Penery wrote:
>>>>
>>>>>Of course I know that.  My point is that with Opteron, even if you are accessing
>>>>>non-local memory *always*, you are not accessing it slower than you would with,
>>>>>say, a traditional SMP machine (2x Xeon, for instance).
>>>>>Of course you can do a lot better - all I'm saying is that there's no way you're
>>>>>going to be doing worse.
>>>>>
>>>>>Either way you win, even with a crappy NUMA algorithm.
>>>>
>>>>I am not so sure. With some NUMA implementations each memory bank has limited
>>>>bandwith, so if you happened to allocate all the critical data in one node's
>>>>memory you'll overload its memory controller.
>>>
>>>>I had seen a case where SMP application was blindly ported to a 32-CPUs NUMA
>>>>system (8 nodes, 4 64-bit CPUs per node, 256Gb RAM total). Application run much
>>>>slower on 32 CPUs than on single CPU.
>>>
>>>I'm not talking about "some NUMA implementations".  I'm talking about 2-4
>>>processor Opteron implementation.  It should never have any of the problems you
>>>describe.  Indeed, you can see from SPECRate that it scales very nearly as well
>>>as Itanium, and that still with compilers/OS still not very NUMA aware or very
>>>good for AMD64.
>>
>>Look at the SPEC programs.  The look at _the_ problem I mentioned for Crafty.
>>It is almost guaranteed that _all_ critical search data for _all_ threads will
>>be allocated in a single processor's local memory.  That is going to be a hot-
>>spot and the fancy redundant memory controllers will _not_ be able to hide that.
>>
>>You can't do 4x memory reads to a single bank.  Yet Crafty is going to demand
>>just that.  And performance is going to suffer.  _significantly_.
>
>Oh i'm not sure about opteron, but at the quads of the origin3800 (each node is
>a quad) you can do READS in parallel.

Yes, but not to the _SAME_ bank of memory.

Same problem on the Cray.  But the Cray has _many_ banks of memory, which is
why it costs way more than the Origin boxes.

>
>In diep i profit from this.
>
>But about local memory you are correct of course when talking about internode
>traffic.
>
>>It is fixable.  But it isn't fixed in the current implementation.

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.