Author: Robert Hyatt
Date: 09:09:12 09/03/03
Go up one level in this thread
On September 03, 2003 at 02:53:36, Tony Werten wrote: >On September 02, 2003 at 11:09:21, Robert Hyatt wrote: > >>On September 01, 2003 at 23:58:38, Jeremiah Penery wrote: >> >>>On September 01, 2003 at 23:53:20, Robert Hyatt wrote: >>> >>>>It is almost guaranteed that _all_ critical search data for _all_ threads will >>>>be allocated in a single processor's local memory. >>> >>>That would be the worst possible usage of memory. Why in the world would a >>>program perform like that? >> >> >>Do you understand how parallel programming works? Suppose you want to >>do this: >> >>TREE blocks[128]; >> >>Where TREE is a big structure. >> >>That puts the blocks into consecutive memory addresses. >> >>On a NUMA machine that puts the blocks into one processor's local memory, >>or it might split across two if you are near the end of one's memory. >> >>On a true SMP (non-NUMA) box, that works _perfectly_ and it is the way things >>are done. On a NUMA box, it sucks. > >I do not know very much about this stuff, but I don't see the problem. > >Just malloc a local copy of TREE and copy the global TREE in it. Of coarse this >isn't optimal, but should work very easy. > >Tony It is more complicated than that due to the recursion... But the basic idea is correct. In the compaq port, I simply allocated split blocks locally for each processor... > >> >>As I said, it takes a _redesign_ of how memory is used, to make a NUMA >>box run efficiently. Assumptions that are fine on any SMP box fail on a >>NUMA box. IE Crafty runs just fine on a 32 CPU T90 from Cray. But it uses >>a crossbar memory switch, not NUMA. Ditto for my dual/quad boxes here.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.