Author: martin fierz
Date: 09:18:26 01/06/04
Go up one level in this thread
On January 06, 2004 at 12:09:05, Robert Hyatt wrote: >On January 06, 2004 at 11:57:01, martin fierz wrote: > >>On January 06, 2004 at 11:49:32, Robert Hyatt wrote: >> >>>On January 06, 2004 at 11:41:59, martin fierz wrote: >>> >>>>On January 06, 2004 at 11:35:42, Gerd Isenberg wrote: >>>> >>>>>On January 06, 2004 at 10:56:00, martin fierz wrote: >>>>> >>>>>>On January 06, 2004 at 09:40:33, Robert Hyatt wrote: >>>>>> >>>>>>>Maybe there is some confusion here. There is _no_ cost for "allocating" >>>>>>>local variables in a function. And I _do_ mean _NO_. These values are >>>>>>>simply "on the stack" and when the function is entered, the stack pointer has >>>>>>>some constant subtracted from it to leave a "hole" that is used to hold local >>>>>>>function variables. no malloc() or anything like that is done by modern >>>>>>>compilers, hence zero cost. >>>>>> >>>>>>hi bob, >>>>>> >>>>>>is this really true? i used to have a HUGE structure describing a move in the >>>>>>first version of my chess program (don't ask, please...). in my negamax function >>>>>>i had this >>>>>> >>>>>>int negamax(int alpha, int beta, int depth) >>>>>> { >>>>>> MOVE movelist[MAXMOVES]; >>>>>> ...other variables... >>>>>> >>>>>> ...typical negamax code... >>>>>> >>>>>> } >>>>>> >>>>>>i saw clear speed differences depending on the value i used for MAXMOVES - and >>>>>>in my tests, i made sure that using 100 would never overflow in the positions i >>>>>>was searching, then i set it to e.g. 200 and the program would get a few % >>>>>>slower, although these additional 100 array elements were never used. >>>>>> >>>>>>how can this be explained? i'm not memsetting the array to 0 or anything. >>>>> >>>>>Hi Martin, >>>>> >>>>>that was my point in my reply to Bob - instruction length. >>>>> >>>>>Did you expect the assembly, >>>> >>>>hi gerd, >>>> >>>>for heaven's sake no! to me, __asm is about equivalent to chinese, i'm not an >>>>expert like you :-) >>>> >>>>>specially the bp-relative offsets of the other >>>>>locals load/stores? In C, per default all locals of a function on the stack, >>>>>including actual parameters are addressed relative to the base pointer register. >>>>>As long those offsets are in a signed byte range (-128..+127), with X86-32 only >>>>>one byte (to sign extend) is necessary, otherwise the complete 32-bit offset is >>>>>needed. >>>> >>>>...and this sounds a bit like chinese too! forgive me for my stupidity in this >>>>area! are you saying that if the storage of my locals exceeds X bytes (X=256?), >>>>then my function will get slower? and that the execution speed of my function >>>>would depend on the storage amount in a step-like way: locals need less than X >>>>bytes => fast; else => slow? >>>>because that is not what i saw. i didn't measure exactly, but the larger i made >>>>MAXMOVES, the slower the program got. >>> >>>Once you go beyond 128 bytes, you get into the problem Gerd mentioned. The >>>problem is that we need to use register + displacement + index. The >>>displacement can be 1 byte or 4, which makes the instruction longer, and hence >>>slower. I had not considered that and it _does_ make a difference. But once >>>you blow 128 bytes, it should not get worse, and the difference is not going to >>>be huge (ie here huge is 1%) anyway... >> >>i see. so that means i should rethink all my large data structures :-) >>BTW, do you just have a huge movelist with many 100 moves as maximum in crafty >>to handle positions with *many* moves - these crazy ones that people sometimes >>construct? because i noticed my program slowing down when i increased the >>MAXMOVES variable, i'm probably safe for normal positions, but certainly not for >>the crazy ones. what number is considered safe for even crazy positions? > >what I have is one big moves[n] array. And a last[ply] array. last[ply] >always points to 1 beyond the last move generated, so that at ply+1 I start >the move list at last[ply-1] and when I am done, I remember the last[ply] as >pointing to the next unused entry in the move list. This way my movelist is >compact for positions where there are not many moves, which helps with the >cache-line aliasing problem. If you make it a 2d array, something like >moves[64][220] (64 plies, 220 moves per ply as we know that there is a position >with 218 legal moves) then if you don't use all of that 220 for a ply, you >have "gaps" in how that maps into cache that is not very efficient, since >on average, you will always have one cache line that is 1/2 unused. My approach >packs the moves together into contiguous virtual memory, which packs them >together in cache as well as can be done knowing that cache uses physical >memory addresses for mapping, not virtual. nice trick - thanks for the explanation! i'll have to try in my program :-) cheers martin > >> >>>But as I mentioned previously, you are pushing the stack size out, and that >>>will have perhaps unexpected effects on cache. A global will always end up >>>in the same place in memory and take up one chunk of cache. Locals will >>>stack up if you are talking recursion, and take up multiple chunks of cache, >>>displacing something else. >> >>thanks for the explanation - that cache thing makes sense! > >Also Gerd's instruction length point was important. I had simply ignored >that since I wasn't actually writing assembly code at the time when I >answered. > > > >> >>cheers >> martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.