Author: Robert Hyatt
Date: 09:33:33 01/06/04
Go up one level in this thread
On January 06, 2004 at 12:18:26, martin fierz wrote: >On January 06, 2004 at 12:09:05, Robert Hyatt wrote: > >>On January 06, 2004 at 11:57:01, martin fierz wrote: >> >>>On January 06, 2004 at 11:49:32, Robert Hyatt wrote: >>> >>>>On January 06, 2004 at 11:41:59, martin fierz wrote: >>>> >>>>>On January 06, 2004 at 11:35:42, Gerd Isenberg wrote: >>>>> >>>>>>On January 06, 2004 at 10:56:00, martin fierz wrote: >>>>>> >>>>>>>On January 06, 2004 at 09:40:33, Robert Hyatt wrote: >>>>>>> >>>>>>>>Maybe there is some confusion here. There is _no_ cost for "allocating" >>>>>>>>local variables in a function. And I _do_ mean _NO_. These values are >>>>>>>>simply "on the stack" and when the function is entered, the stack pointer has >>>>>>>>some constant subtracted from it to leave a "hole" that is used to hold local >>>>>>>>function variables. no malloc() or anything like that is done by modern >>>>>>>>compilers, hence zero cost. >>>>>>> >>>>>>>hi bob, >>>>>>> >>>>>>>is this really true? i used to have a HUGE structure describing a move in the >>>>>>>first version of my chess program (don't ask, please...). in my negamax function >>>>>>>i had this >>>>>>> >>>>>>>int negamax(int alpha, int beta, int depth) >>>>>>> { >>>>>>> MOVE movelist[MAXMOVES]; >>>>>>> ...other variables... >>>>>>> >>>>>>> ...typical negamax code... >>>>>>> >>>>>>> } >>>>>>> >>>>>>>i saw clear speed differences depending on the value i used for MAXMOVES - and >>>>>>>in my tests, i made sure that using 100 would never overflow in the positions i >>>>>>>was searching, then i set it to e.g. 200 and the program would get a few % >>>>>>>slower, although these additional 100 array elements were never used. >>>>>>> >>>>>>>how can this be explained? i'm not memsetting the array to 0 or anything. >>>>>> >>>>>>Hi Martin, >>>>>> >>>>>>that was my point in my reply to Bob - instruction length. >>>>>> >>>>>>Did you expect the assembly, >>>>> >>>>>hi gerd, >>>>> >>>>>for heaven's sake no! to me, __asm is about equivalent to chinese, i'm not an >>>>>expert like you :-) >>>>> >>>>>>specially the bp-relative offsets of the other >>>>>>locals load/stores? In C, per default all locals of a function on the stack, >>>>>>including actual parameters are addressed relative to the base pointer register. >>>>>>As long those offsets are in a signed byte range (-128..+127), with X86-32 only >>>>>>one byte (to sign extend) is necessary, otherwise the complete 32-bit offset is >>>>>>needed. >>>>> >>>>>...and this sounds a bit like chinese too! forgive me for my stupidity in this >>>>>area! are you saying that if the storage of my locals exceeds X bytes (X=256?), >>>>>then my function will get slower? and that the execution speed of my function >>>>>would depend on the storage amount in a step-like way: locals need less than X >>>>>bytes => fast; else => slow? >>>>>because that is not what i saw. i didn't measure exactly, but the larger i made >>>>>MAXMOVES, the slower the program got. >>>> >>>>Once you go beyond 128 bytes, you get into the problem Gerd mentioned. The >>>>problem is that we need to use register + displacement + index. The >>>>displacement can be 1 byte or 4, which makes the instruction longer, and hence >>>>slower. I had not considered that and it _does_ make a difference. But once >>>>you blow 128 bytes, it should not get worse, and the difference is not going to >>>>be huge (ie here huge is 1%) anyway... >>> >>>i see. so that means i should rethink all my large data structures :-) >>>BTW, do you just have a huge movelist with many 100 moves as maximum in crafty >>>to handle positions with *many* moves - these crazy ones that people sometimes >>>construct? because i noticed my program slowing down when i increased the >>>MAXMOVES variable, i'm probably safe for normal positions, but certainly not for >>>the crazy ones. what number is considered safe for even crazy positions? >> >>what I have is one big moves[n] array. And a last[ply] array. last[ply] >>always points to 1 beyond the last move generated, so that at ply+1 I start >>the move list at last[ply-1] and when I am done, I remember the last[ply] as >>pointing to the next unused entry in the move list. This way my movelist is >>compact for positions where there are not many moves, which helps with the >>cache-line aliasing problem. If you make it a 2d array, something like >>moves[64][220] (64 plies, 220 moves per ply as we know that there is a position >>with 218 legal moves) then if you don't use all of that 220 for a ply, you >>have "gaps" in how that maps into cache that is not very efficient, since >>on average, you will always have one cache line that is 1/2 unused. My approach >>packs the moves together into contiguous virtual memory, which packs them >>together in cache as well as can be done knowing that cache uses physical >>memory addresses for mapping, not virtual. > >nice trick - thanks for the explanation! i'll have to try in my program :-) > >cheers > martin > It's been around forever. I always did it like this, in fact, all the way back to the 1970's at least and maybe earlier (hard to remember what I did in 1968 of course. :) ) >> >>> >>>>But as I mentioned previously, you are pushing the stack size out, and that >>>>will have perhaps unexpected effects on cache. A global will always end up >>>>in the same place in memory and take up one chunk of cache. Locals will >>>>stack up if you are talking recursion, and take up multiple chunks of cache, >>>>displacing something else. >>> >>>thanks for the explanation - that cache thing makes sense! >> >>Also Gerd's instruction length point was important. I had simply ignored >>that since I wasn't actually writing assembly code at the time when I >>answered. >> >> >> >>> >>>cheers >>> martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.