Author: Robert Hyatt
Date: 08:13:12 09/02/03
Go up one level in this thread
On September 02, 2003 at 07:17:58, Gian-Carlo Pascutto wrote: >On September 01, 2003 at 23:58:38, Jeremiah Penery wrote: > >>On September 01, 2003 at 23:53:20, Robert Hyatt wrote: >> >>>It is almost guaranteed that _all_ critical search data for _all_ threads will >>>be allocated in a single processor's local memory. >> >>That would be the worst possible usage of memory. Why in the world would a >>program perform like that? > >Memory is divided in equal parts for NUMA-Opteron AFAIK, with >each CPU owning one chunk. > >Crafty just allocates one continuous big chunk for search structures, >and hence it's in one processors RAM. > >Messy thing about NUMA is the large hardware dependence of the code >you end up writing. It is certainly messy. > >I'm curious about how to ensure that a chunk of memory you allocate >is on your local CPU. Just splitting up the splitblock list in per CPU >pieces, so each CPU has a part in local memory would already remove half of the >latencies I guess. At the very least the CPU that created the splitblock has >local access, whereas normally you risk everything goes over remote access. How this is done varies from machine to machine. On the Compaq compiler I was testing on, you used a different form of malloc() that says "I want local memory to _this_ processor, not memory anywhere that is convenient." > >I didn't notice any problems (on the contrary!) when running on a 4-way NUMA >Opeteron box with my thing, but I'm much less dependent on shared data between >threads, so even all-remote access isn't killing. That is one potential benefit of avoiding lightweight threads. Of course, the less you share, the more other issues rear up... > >-- >GCP
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.