Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: about SEE of Crafty

Author: Robert Hyatt

Date: 09:09:05 01/06/04

Go up one level in this thread


On January 06, 2004 at 11:57:01, martin fierz wrote:

>On January 06, 2004 at 11:49:32, Robert Hyatt wrote:
>
>>On January 06, 2004 at 11:41:59, martin fierz wrote:
>>
>>>On January 06, 2004 at 11:35:42, Gerd Isenberg wrote:
>>>
>>>>On January 06, 2004 at 10:56:00, martin fierz wrote:
>>>>
>>>>>On January 06, 2004 at 09:40:33, Robert Hyatt wrote:
>>>>>
>>>>>>Maybe there is some confusion here.  There is _no_ cost for "allocating"
>>>>>>local variables in a function.  And I _do_ mean _NO_.  These values are
>>>>>>simply "on the stack" and when the function is entered, the stack pointer has
>>>>>>some constant subtracted from it to leave a "hole" that is used to hold local
>>>>>>function variables.  no malloc() or anything like that is done by modern
>>>>>>compilers, hence zero cost.
>>>>>
>>>>>hi bob,
>>>>>
>>>>>is this really true? i used to have a HUGE structure describing a move in the
>>>>>first version of my chess program (don't ask, please...). in my negamax function
>>>>>i had this
>>>>>
>>>>>int negamax(int alpha, int beta, int depth)
>>>>>   {
>>>>>   MOVE movelist[MAXMOVES];
>>>>>   ...other variables...
>>>>>
>>>>>   ...typical negamax code...
>>>>>
>>>>>   }
>>>>>
>>>>>i saw clear speed differences depending on the value i used for MAXMOVES - and
>>>>>in my tests, i made sure that using 100 would never overflow in the positions i
>>>>>was searching, then i set it to e.g. 200 and the program would get a few %
>>>>>slower, although these additional 100 array elements were never used.
>>>>>
>>>>>how can this be explained? i'm not memsetting the array to 0 or anything.
>>>>
>>>>Hi Martin,
>>>>
>>>>that was my point in my reply to Bob - instruction length.
>>>>
>>>>Did you expect the assembly,
>>>
>>>hi gerd,
>>>
>>>for heaven's sake no! to me, __asm is about equivalent to chinese, i'm not an
>>>expert like you :-)
>>>
>>>>specially the bp-relative offsets of the other
>>>>locals load/stores? In C, per default all locals of a function on the stack,
>>>>including actual parameters are addressed relative to the base pointer register.
>>>>As long those offsets are in a signed byte range (-128..+127), with X86-32 only
>>>>one byte (to sign extend) is necessary, otherwise the complete 32-bit offset is
>>>>needed.
>>>
>>>...and this sounds a bit like chinese too! forgive me for my stupidity in this
>>>area! are you saying that if the storage of my locals exceeds X bytes (X=256?),
>>>then my function will get slower? and that the execution speed of my function
>>>would depend on the storage amount in a step-like way: locals need less than X
>>>bytes => fast; else => slow?
>>>because that is not what i saw. i didn't measure exactly, but the larger i made
>>>MAXMOVES, the slower the program got.
>>
>>Once you go beyond 128 bytes, you get into the problem Gerd mentioned.  The
>>problem is that we need to use register + displacement + index.  The
>>displacement can be 1 byte or 4, which makes the instruction longer, and hence
>>slower.  I had not considered that and it _does_ make a difference.  But once
>>you blow 128 bytes, it should not get worse, and the difference is not going to
>>be huge (ie here huge is 1%) anyway...
>
>i see. so that means i should rethink all my large data structures :-)
>BTW, do you just have a huge movelist with many 100 moves as maximum in crafty
>to handle positions with *many* moves - these crazy ones that people sometimes
>construct? because i noticed my program slowing down when i increased the
>MAXMOVES variable, i'm probably safe for normal positions, but certainly not for
>the crazy ones. what number is considered safe for even crazy positions?

what I have is one big moves[n] array.  And a last[ply] array.  last[ply]
always points to 1 beyond the last move generated, so that at ply+1 I start
the move list at last[ply-1] and when I am done, I remember the last[ply] as
pointing to the next unused entry in the move list.  This way my movelist is
compact for positions where there are not many moves, which helps with the
cache-line aliasing problem.  If you make it a 2d array, something like
moves[64][220] (64 plies, 220 moves per ply as we know that there is a position
with 218 legal moves) then if you don't use all of that 220 for a ply, you
have "gaps" in how that maps into cache that is not very efficient, since
on average, you will always have one cache line that is 1/2 unused.  My approach
packs the moves together into contiguous virtual memory, which packs them
together in cache as well as can be done knowing that cache uses physical
memory addresses for mapping, not virtual.

>
>>But as I mentioned previously, you are pushing the stack size out, and that
>>will have perhaps unexpected effects on cache.  A global will always end up
>>in the same place in memory and take up one chunk of cache.  Locals will
>>stack up if you are talking recursion, and take up multiple chunks of cache,
>>displacing something else.
>
>thanks for the explanation - that cache thing makes sense!

Also Gerd's instruction length point was important.  I had simply ignored
that since I wasn't actually writing assembly code at the time when I
answered.



>
>cheers
> martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.