Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: about SEE of Crafty

Author: Vasik Rajlich

Date: 05:40:02 01/07/04

Go up one level in this thread


On January 06, 2004 at 11:49:32, Robert Hyatt wrote:

>On January 06, 2004 at 11:41:59, martin fierz wrote:
>
>>On January 06, 2004 at 11:35:42, Gerd Isenberg wrote:
>>
>>>On January 06, 2004 at 10:56:00, martin fierz wrote:
>>>
>>>>On January 06, 2004 at 09:40:33, Robert Hyatt wrote:
>>>>
>>>>>Maybe there is some confusion here.  There is _no_ cost for "allocating"
>>>>>local variables in a function.  And I _do_ mean _NO_.  These values are
>>>>>simply "on the stack" and when the function is entered, the stack pointer has
>>>>>some constant subtracted from it to leave a "hole" that is used to hold local
>>>>>function variables.  no malloc() or anything like that is done by modern
>>>>>compilers, hence zero cost.
>>>>
>>>>hi bob,
>>>>
>>>>is this really true? i used to have a HUGE structure describing a move in the
>>>>first version of my chess program (don't ask, please...). in my negamax function
>>>>i had this
>>>>
>>>>int negamax(int alpha, int beta, int depth)
>>>>   {
>>>>   MOVE movelist[MAXMOVES];
>>>>   ...other variables...
>>>>
>>>>   ...typical negamax code...
>>>>
>>>>   }
>>>>
>>>>i saw clear speed differences depending on the value i used for MAXMOVES - and
>>>>in my tests, i made sure that using 100 would never overflow in the positions i
>>>>was searching, then i set it to e.g. 200 and the program would get a few %
>>>>slower, although these additional 100 array elements were never used.
>>>>
>>>>how can this be explained? i'm not memsetting the array to 0 or anything.
>>>
>>>Hi Martin,
>>>
>>>that was my point in my reply to Bob - instruction length.
>>>
>>>Did you expect the assembly,
>>
>>hi gerd,
>>
>>for heaven's sake no! to me, __asm is about equivalent to chinese, i'm not an
>>expert like you :-)
>>
>>>specially the bp-relative offsets of the other
>>>locals load/stores? In C, per default all locals of a function on the stack,
>>>including actual parameters are addressed relative to the base pointer register.
>>>As long those offsets are in a signed byte range (-128..+127), with X86-32 only
>>>one byte (to sign extend) is necessary, otherwise the complete 32-bit offset is
>>>needed.
>>
>>...and this sounds a bit like chinese too! forgive me for my stupidity in this
>>area! are you saying that if the storage of my locals exceeds X bytes (X=256?),
>>then my function will get slower? and that the execution speed of my function
>>would depend on the storage amount in a step-like way: locals need less than X
>>bytes => fast; else => slow?
>>because that is not what i saw. i didn't measure exactly, but the larger i made
>>MAXMOVES, the slower the program got.
>
>Once you go beyond 128 bytes, you get into the problem Gerd mentioned.  The
>problem is that we need to use register + displacement + index.  The
>displacement can be 1 byte or 4, which makes the instruction longer, and hence
>slower.  I had not considered that and it _does_ make a difference.  But once
>you blow 128 bytes, it should not get worse, and the difference is not going to
>be huge (ie here huge is 1%) anyway...
>
>But as I mentioned previously, you are pushing the stack size out, and that
>will have perhaps unexpected effects on cache.  A global will always end up
>in the same place in memory and take up one chunk of cache.  Locals will
>stack up if you are talking recursion, and take up multiple chunks of cache,
>displacing something else.
>

Is there some good way to estimate the cost due to cache misses of adding data
or code? It seems to me that a very well-designed engine could more or less
"stay in the cache", or keep some important kernel in the cache, while adding
some large structure to this could in the worst case start a sort of
cache-cycling, where there are almost no hits.

Probably it's better not to worry too much about this, but maybe there are some
rules of thumb regarding the tradeoffs between adding some data structure or
specialized routine and the # of instructions saved.

Vas

>
>>
>>cheers
>>  martin
>>
>>>
>>>Gerd
>>>
>>>>
>>>>like uri, i thought the program was allocating memory every time it entered the
>>>>function, and that that was taking time. i ended up making my MOVE variable much
>>>>smaller - although it's still much larger than crafty's :-)
>>>>
>>>>i thought perhaps i should allocate the movelist beforehand as a global, like
>>>>this:
>>>>
>>>>MOVE *movelist[MAX_SEARCH_DEPTH]
>>>>for(i=0;i<MAX_SEARCH_DEPTH;i++)
>>>>   movelist[i] = malloc(sizeof(MOVE)*MAXMOVES)
>>>>
>>>>and then in the negamax function use the appropriate movelist for that search
>>>>depth. that turned out to be significantly slower than the local variable
>>>>approach. again, i don't understand why that would be... - anyone?
>>>>
>>>>cheers
>>>>  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.