Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The need to unmake move

Author: Vincent Diepeveen

Date: 07:45:40 09/03/03

Go up one level in this thread


On September 03, 2003 at 00:07:37, Jeremiah Penery wrote:

>On September 02, 2003 at 22:54:49, Robert Hyatt wrote:
>
>>Maybe.  But I use threads.  And on NUMA threads are _bad_.  One example,
>>do you _really_ want to _share_ all the attack bitmap stuff?  That means it
>>is in one processor's local memory, but will be slow for all others.  What
>>about the instructions?  Same thing.
>
>After some thinking, it seems to me that the *average* memory access speed will
>be the same no matter where the data is placed, for anything intended to be

If n cpu's can access local memory at 280 ns (R14000)
and accessing remote memory is 6-7 us, then what is faster?

>shared between all processors (in a small NUMA configuration).  The reason for
>this is because what is local to one processor will be non-local to all others.

Unless each processor has its own local copy.

Just for the record. Shipping a megabyte from 1 cpu to another cpu
and then locally accessing it, is way faster than
remotely accessing it @ random.

Thanks,
Vincent

>It doesn't matter if everything is local to the same processor or spread around,
>because the same percentage of total accesses will be non-local in any case
>(unless there is a disparity between the number of accesses each CPU is trying
>to accomplish).
>
>The only problem is that one processor's memory banks might get hammered, but
>that _is_ the same with an (similarly small) SMP configuration - all accesses go
>serially through one memory controller.

memory banks you can access using the HUB in parallel. Reads cost nothing,
just the calling processor will be a bit waiting (like 6-7 us worst case, or
like 10-30 us for bob's new 4 node thing).

>As machine size increases, of course, NUMA can run into more problems.  But then
>SMP has its own problems as well (cost and complexity of memory sub-system,
>mostly).

Not if you directly connect each node to each node , which is what Cray does.

That keeps latency very fast, but it's $$$$.

So cc-NUMA is always slower for remote latency than Cray trivially.

That you're still denying this is madness.

We're talking about a factor 20 here.







This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.