Author: Robert Hyatt
Date: 14:52:07 08/18/05
Go up one level in this thread
On August 18, 2005 at 14:43:46, Eugene Nalimov wrote: >On August 18, 2005 at 06:15:15, Robert Hyatt wrote: > >>On August 16, 2005 at 19:00:48, Eugene Nalimov wrote: >> >>>On August 15, 2005 at 22:19:36, Robert Hyatt wrote: >>> >>>>In NUMA linux, when I malloc() or shmget() or whatever any kind of memory, it >>>>isn't actually allocated on a specific node until the page is faulted in on a >>>>reference. This lets me shmget() the TREE data for each process before I fork() >>>>the processes, then each process initializes its own TREE blocks, which faults >>>>them into the physical memory on the node where that particular process is >>>>running. >>>> >>>>Does windows behave the same way, or is the mallocInterleaved() approach >>>>currently used in Crafty the best approach. I'm going to have to do a little >>>>tweaking to make the current program approach behave on windows, and if windows >>>>allocates physical memory like linux, it makes the approach work on both, if >>>>not, oh well... >>> >>>Look at the code I wrote. There are 2 functions: >>> >>>void *WinMalloc(size_t cbBytes, int iThread) >>>void *WinMallocInterleaved(size_t cbBytes, int cThreads) >>> >>>Basically what is done in fisrt one is: >>>* remember current CPU affinity mask >>>* force current thread to be executed on CPU#iThread >>>* allocate memory >>>* fill it with zeroes, so it will be committed >>>* restore CPU affinity mask >>> >>>The second function is very similar: >>>* remember current CPU affinity mask >>>* loop for CPU 0..N >>> * force current thread to be executed on that CPU >>> * allocate some memory >>> * fill it with zeroes, so it will be committed >>>* restore CPU affinity mask >>> >>>Thanks, >>>Eugene >> >> >>I understood that part. What wasn't clear was this: >> >>Suppose I malloc() everything up front, but do not touch it. Then as threads >>are spawned, they zero their own "split blocks" which on linux causes those >>pages to be "faulted in" to the resident set, and the physical RAM is allocated >>on the local node where they are first accessed. It sort of looks like Windows >>does the same thing based on your "allocate and touch" approach. >> >>Linux gives me a couple of approaches. One as above is the simplest. I can >>also specify that memory be allocated on a specific node, but I am not sure that >>is totally compatible with the shmget()/shmat() approach I am using to avoid >>POSIX threads. >> >>What we have certainly works, but if windows behaves like linux, so that I can >>malloc up front, and then touch as the threads get initialized, overall the code >>will be a bit simpler since then both will be doing the same thing... >> >>Hence my question... :) > >I would not bet that malloc() does not touch memory it allocates, or that is >always returned not yet commited memory, or that memory is cache line aligned. >If you noticed for NUMA I am using not malloc() but Windows API calls that first >reserve and than commit memory. > >Your change will probably work, but it will require extra testing... > >Thanks, >Eugene the memory I am allocating via shmget() must be cache-aligned because this memory always starts on a page boundary and allocates in multiples of the hardware page-size only... Linux has a similar function. It is possible to say "this must be put on node x" and then any memory pages you touch to "fault in" beyond that point gets the physical pages from node x's memory. But I can't directly use the built-in intrinsic for that as I need shmget() so that the memory is shared across the processes, as opposed to malloc() memory which would become "private" since it is not shared by definition... The only headache I have found is that it is hard to verify where something is loaded into physical RAM. I did some unkosher things to see which physical RAM pages were being used for the split blocks, and it all was done correctly. There is, in threads, a problem with the first page of a thread's stack being allocated on the node that creates the thread... using fork() even this is not an issue due to the unix copy-on-write VM approach (most everyone uses copy-on-write in unix, I suspect windows does as well...).
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.