Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Who can update about new 64 bits chip?

Author: Vincent Diepeveen

Date: 09:36:52 08/25/02

Go up one level in this thread


On August 25, 2002 at 10:09:33, Dan Andersson wrote:

>>
>>The multiprocessor versions of the Hammer will be what most guess it is.
>>It is more like a NUMA system. Getting data from local memory is very
>>fast (faster than RAM is now), but on the other hand getting data from
>>other nodes is a lot slower.
>>
>Not a lot slower actually. And factoring in the faster memory speeds you will

Factor 4 or so at 8 processors i guess.

If you don't find that a lot slower, discussion is ended.

>not be hit by it at all. Since the memory controllers and the communication
>ports are connected to the CPU core by the same type of bus. And the latency
>will be lower the faster the CPU goes. Down to an asymtotic constant limit. So

I didn't pick up enough from hardware to understand why AMD is claiming
something like this for hammer. Sounds big BS to me personally unless
they invented a new wheel.

>you need *no* complete redesign to run Crafty fast. No redesing at all,

Fast is a flexibel word Mr Andersson.

For me the important thing is: "how many nodes a second
is a single cpu version getting" versus "what nps
does the 8 processor version get".

if the difference is not a factor 8, then the software needs
a redesign obviously.

In case of crafty bob already answerred in normal words what he feels
he must do. I would add to that implementation specific thought:
getting rid of smp_lock variable
as well, which prevents all the cpu's from splitting or doing a stopthread,
and if copying from P0 to Pn is real slow (of course also the lock()
is a lot slower on a NUMA design) then obviously that is exponentially
giving a problem if the number of cpu's rises.

Also the issue of getting from multithreading to multiprocessing bob already
answerred, because it is of course not so smart to spend 2000 clocks just
to get a cache line from another cpu which holds the move generation
tables. All that must get done local of course (so easiest thing is
to fork() crafty a few times and then share the hashtable and the 'tree'
datastructure).

Technical spoken of course you can rewrite multithreading to a point that
it is more looking like multiprocessing and vice versa. There is a big
gray area there and the important thing to realize is that in case
of crafty it is really no big problem to do that.

Getting rid of smp_lock is however, because if 4 processors at the same
time call the function StopThread() then you add zillions of race conditions
to the program which are not there now (at least not provable).

I hope you realize that rewriting crafty from what it is now to something
that is 8 times faster at 8 NUMA cpu's is a lot more work than just a
recompile. I estimate it at 2 months of fulltime work from a skilled
person, at least that's what i needed to port diep from a multiprocessor
version (also having like crafty a smp_lock) to something that works
great on a SGI NUMA now.

NUMA is obviously the software design to keep in mind for the future.
You don't need to get slowed down at all by making use of numa a look
like architecture, it's simply a different way of design.

Doesn't take away that getting memory from a remote node is very slow.

>actually. But to get the absolute maximum you need to factor in the hardware.



>MvH Dan Andersson



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.