Author: Robert Hyatt
Date: 06:52:39 01/07/04
Go up one level in this thread
On January 07, 2004 at 08:40:02, Vasik Rajlich wrote: >On January 06, 2004 at 11:49:32, Robert Hyatt wrote: > >>On January 06, 2004 at 11:41:59, martin fierz wrote: >> >>>On January 06, 2004 at 11:35:42, Gerd Isenberg wrote: >>> >>>>On January 06, 2004 at 10:56:00, martin fierz wrote: >>>> >>>>>On January 06, 2004 at 09:40:33, Robert Hyatt wrote: >>>>> >>>>>>Maybe there is some confusion here. There is _no_ cost for "allocating" >>>>>>local variables in a function. And I _do_ mean _NO_. These values are >>>>>>simply "on the stack" and when the function is entered, the stack pointer has >>>>>>some constant subtracted from it to leave a "hole" that is used to hold local >>>>>>function variables. no malloc() or anything like that is done by modern >>>>>>compilers, hence zero cost. >>>>> >>>>>hi bob, >>>>> >>>>>is this really true? i used to have a HUGE structure describing a move in the >>>>>first version of my chess program (don't ask, please...). in my negamax function >>>>>i had this >>>>> >>>>>int negamax(int alpha, int beta, int depth) >>>>> { >>>>> MOVE movelist[MAXMOVES]; >>>>> ...other variables... >>>>> >>>>> ...typical negamax code... >>>>> >>>>> } >>>>> >>>>>i saw clear speed differences depending on the value i used for MAXMOVES - and >>>>>in my tests, i made sure that using 100 would never overflow in the positions i >>>>>was searching, then i set it to e.g. 200 and the program would get a few % >>>>>slower, although these additional 100 array elements were never used. >>>>> >>>>>how can this be explained? i'm not memsetting the array to 0 or anything. >>>> >>>>Hi Martin, >>>> >>>>that was my point in my reply to Bob - instruction length. >>>> >>>>Did you expect the assembly, >>> >>>hi gerd, >>> >>>for heaven's sake no! to me, __asm is about equivalent to chinese, i'm not an >>>expert like you :-) >>> >>>>specially the bp-relative offsets of the other >>>>locals load/stores? In C, per default all locals of a function on the stack, >>>>including actual parameters are addressed relative to the base pointer register. >>>>As long those offsets are in a signed byte range (-128..+127), with X86-32 only >>>>one byte (to sign extend) is necessary, otherwise the complete 32-bit offset is >>>>needed. >>> >>>...and this sounds a bit like chinese too! forgive me for my stupidity in this >>>area! are you saying that if the storage of my locals exceeds X bytes (X=256?), >>>then my function will get slower? and that the execution speed of my function >>>would depend on the storage amount in a step-like way: locals need less than X >>>bytes => fast; else => slow? >>>because that is not what i saw. i didn't measure exactly, but the larger i made >>>MAXMOVES, the slower the program got. >> >>Once you go beyond 128 bytes, you get into the problem Gerd mentioned. The >>problem is that we need to use register + displacement + index. The >>displacement can be 1 byte or 4, which makes the instruction longer, and hence >>slower. I had not considered that and it _does_ make a difference. But once >>you blow 128 bytes, it should not get worse, and the difference is not going to >>be huge (ie here huge is 1%) anyway... >> >>But as I mentioned previously, you are pushing the stack size out, and that >>will have perhaps unexpected effects on cache. A global will always end up >>in the same place in memory and take up one chunk of cache. Locals will >>stack up if you are talking recursion, and take up multiple chunks of cache, >>displacing something else. >> > >Is there some good way to estimate the cost due to cache misses of adding data >or code? It seems to me that a very well-designed engine could more or less >"stay in the cache", or keep some important kernel in the cache, while adding >some large structure to this could in the worst case start a sort of >cache-cycling, where there are almost no hits. There are a few basic rules: (1) smaller is better; (2) temporal locality. Variables that are used close together in terms of time of reference should be close together in memory so they appear in one cache line. (3) cache-memory aliasing is an issue that most O/S memory managers seem to ignore, where multiple real pages of a process end up direct-mapping to the same cache lines, causing thrashing... There is little you can do about this however, from a programming point of view. Every time you run the program, you get a different aliasing, unless the O/S knows about an approach known as "page coloring" to fill in a physical address space to provide optimal cache mapping. (4) random addresses hurt due to the TLB miss penalty of 1-2 extra memory latency cycles added to random address access times. (5) sequential memory referencing is faster due to pre-fetching caused by long cache line sizes. There are many others, but in many cases it really helps to know the actual target machine. The above are pretty general rules. > >Probably it's better not to worry too much about this, but maybe there are some >rules of thumb regarding the tradeoffs between adding some data structure or >specialized routine and the # of instructions saved. As a general rule, number of instructions are less important than number of cache line misses. You can execute several hundred instructions during the period of time it takes to fetch a memory word that is not currently in cache. > >Vas > >> >>> >>>cheers >>> martin >>> >>>> >>>>Gerd >>>> >>>>> >>>>>like uri, i thought the program was allocating memory every time it entered the >>>>>function, and that that was taking time. i ended up making my MOVE variable much >>>>>smaller - although it's still much larger than crafty's :-) >>>>> >>>>>i thought perhaps i should allocate the movelist beforehand as a global, like >>>>>this: >>>>> >>>>>MOVE *movelist[MAX_SEARCH_DEPTH] >>>>>for(i=0;i<MAX_SEARCH_DEPTH;i++) >>>>> movelist[i] = malloc(sizeof(MOVE)*MAXMOVES) >>>>> >>>>>and then in the negamax function use the appropriate movelist for that search >>>>>depth. that turned out to be significantly slower than the local variable >>>>>approach. again, i don't understand why that would be... - anyone? >>>>> >>>>>cheers >>>>> martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.