Author: Vincent Diepeveen
Date: 09:36:52 08/25/02
Go up one level in this thread
On August 25, 2002 at 10:09:33, Dan Andersson wrote: >> >>The multiprocessor versions of the Hammer will be what most guess it is. >>It is more like a NUMA system. Getting data from local memory is very >>fast (faster than RAM is now), but on the other hand getting data from >>other nodes is a lot slower. >> >Not a lot slower actually. And factoring in the faster memory speeds you will Factor 4 or so at 8 processors i guess. If you don't find that a lot slower, discussion is ended. >not be hit by it at all. Since the memory controllers and the communication >ports are connected to the CPU core by the same type of bus. And the latency >will be lower the faster the CPU goes. Down to an asymtotic constant limit. So I didn't pick up enough from hardware to understand why AMD is claiming something like this for hammer. Sounds big BS to me personally unless they invented a new wheel. >you need *no* complete redesign to run Crafty fast. No redesing at all, Fast is a flexibel word Mr Andersson. For me the important thing is: "how many nodes a second is a single cpu version getting" versus "what nps does the 8 processor version get". if the difference is not a factor 8, then the software needs a redesign obviously. In case of crafty bob already answerred in normal words what he feels he must do. I would add to that implementation specific thought: getting rid of smp_lock variable as well, which prevents all the cpu's from splitting or doing a stopthread, and if copying from P0 to Pn is real slow (of course also the lock() is a lot slower on a NUMA design) then obviously that is exponentially giving a problem if the number of cpu's rises. Also the issue of getting from multithreading to multiprocessing bob already answerred, because it is of course not so smart to spend 2000 clocks just to get a cache line from another cpu which holds the move generation tables. All that must get done local of course (so easiest thing is to fork() crafty a few times and then share the hashtable and the 'tree' datastructure). Technical spoken of course you can rewrite multithreading to a point that it is more looking like multiprocessing and vice versa. There is a big gray area there and the important thing to realize is that in case of crafty it is really no big problem to do that. Getting rid of smp_lock is however, because if 4 processors at the same time call the function StopThread() then you add zillions of race conditions to the program which are not there now (at least not provable). I hope you realize that rewriting crafty from what it is now to something that is 8 times faster at 8 NUMA cpu's is a lot more work than just a recompile. I estimate it at 2 months of fulltime work from a skilled person, at least that's what i needed to port diep from a multiprocessor version (also having like crafty a smp_lock) to something that works great on a SGI NUMA now. NUMA is obviously the software design to keep in mind for the future. You don't need to get slowed down at all by making use of numa a look like architecture, it's simply a different way of design. Doesn't take away that getting memory from a remote node is very slow. >actually. But to get the absolute maximum you need to factor in the hardware. >MvH Dan Andersson
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.