Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: about SEE of Crafty

Author: martin fierz

Date: 09:18:26 01/06/04

Go up one level in this thread


On January 06, 2004 at 12:09:05, Robert Hyatt wrote:

>On January 06, 2004 at 11:57:01, martin fierz wrote:
>
>>On January 06, 2004 at 11:49:32, Robert Hyatt wrote:
>>
>>>On January 06, 2004 at 11:41:59, martin fierz wrote:
>>>
>>>>On January 06, 2004 at 11:35:42, Gerd Isenberg wrote:
>>>>
>>>>>On January 06, 2004 at 10:56:00, martin fierz wrote:
>>>>>
>>>>>>On January 06, 2004 at 09:40:33, Robert Hyatt wrote:
>>>>>>
>>>>>>>Maybe there is some confusion here.  There is _no_ cost for "allocating"
>>>>>>>local variables in a function.  And I _do_ mean _NO_.  These values are
>>>>>>>simply "on the stack" and when the function is entered, the stack pointer has
>>>>>>>some constant subtracted from it to leave a "hole" that is used to hold local
>>>>>>>function variables.  no malloc() or anything like that is done by modern
>>>>>>>compilers, hence zero cost.
>>>>>>
>>>>>>hi bob,
>>>>>>
>>>>>>is this really true? i used to have a HUGE structure describing a move in the
>>>>>>first version of my chess program (don't ask, please...). in my negamax function
>>>>>>i had this
>>>>>>
>>>>>>int negamax(int alpha, int beta, int depth)
>>>>>>   {
>>>>>>   MOVE movelist[MAXMOVES];
>>>>>>   ...other variables...
>>>>>>
>>>>>>   ...typical negamax code...
>>>>>>
>>>>>>   }
>>>>>>
>>>>>>i saw clear speed differences depending on the value i used for MAXMOVES - and
>>>>>>in my tests, i made sure that using 100 would never overflow in the positions i
>>>>>>was searching, then i set it to e.g. 200 and the program would get a few %
>>>>>>slower, although these additional 100 array elements were never used.
>>>>>>
>>>>>>how can this be explained? i'm not memsetting the array to 0 or anything.
>>>>>
>>>>>Hi Martin,
>>>>>
>>>>>that was my point in my reply to Bob - instruction length.
>>>>>
>>>>>Did you expect the assembly,
>>>>
>>>>hi gerd,
>>>>
>>>>for heaven's sake no! to me, __asm is about equivalent to chinese, i'm not an
>>>>expert like you :-)
>>>>
>>>>>specially the bp-relative offsets of the other
>>>>>locals load/stores? In C, per default all locals of a function on the stack,
>>>>>including actual parameters are addressed relative to the base pointer register.
>>>>>As long those offsets are in a signed byte range (-128..+127), with X86-32 only
>>>>>one byte (to sign extend) is necessary, otherwise the complete 32-bit offset is
>>>>>needed.
>>>>
>>>>...and this sounds a bit like chinese too! forgive me for my stupidity in this
>>>>area! are you saying that if the storage of my locals exceeds X bytes (X=256?),
>>>>then my function will get slower? and that the execution speed of my function
>>>>would depend on the storage amount in a step-like way: locals need less than X
>>>>bytes => fast; else => slow?
>>>>because that is not what i saw. i didn't measure exactly, but the larger i made
>>>>MAXMOVES, the slower the program got.
>>>
>>>Once you go beyond 128 bytes, you get into the problem Gerd mentioned.  The
>>>problem is that we need to use register + displacement + index.  The
>>>displacement can be 1 byte or 4, which makes the instruction longer, and hence
>>>slower.  I had not considered that and it _does_ make a difference.  But once
>>>you blow 128 bytes, it should not get worse, and the difference is not going to
>>>be huge (ie here huge is 1%) anyway...
>>
>>i see. so that means i should rethink all my large data structures :-)
>>BTW, do you just have a huge movelist with many 100 moves as maximum in crafty
>>to handle positions with *many* moves - these crazy ones that people sometimes
>>construct? because i noticed my program slowing down when i increased the
>>MAXMOVES variable, i'm probably safe for normal positions, but certainly not for
>>the crazy ones. what number is considered safe for even crazy positions?
>
>what I have is one big moves[n] array.  And a last[ply] array.  last[ply]
>always points to 1 beyond the last move generated, so that at ply+1 I start
>the move list at last[ply-1] and when I am done, I remember the last[ply] as
>pointing to the next unused entry in the move list.  This way my movelist is
>compact for positions where there are not many moves, which helps with the
>cache-line aliasing problem.  If you make it a 2d array, something like
>moves[64][220] (64 plies, 220 moves per ply as we know that there is a position
>with 218 legal moves) then if you don't use all of that 220 for a ply, you
>have "gaps" in how that maps into cache that is not very efficient, since
>on average, you will always have one cache line that is 1/2 unused.  My approach
>packs the moves together into contiguous virtual memory, which packs them
>together in cache as well as can be done knowing that cache uses physical
>memory addresses for mapping, not virtual.

nice trick - thanks for the explanation! i'll have to try in my program :-)

cheers
  martin

>
>>
>>>But as I mentioned previously, you are pushing the stack size out, and that
>>>will have perhaps unexpected effects on cache.  A global will always end up
>>>in the same place in memory and take up one chunk of cache.  Locals will
>>>stack up if you are talking recursion, and take up multiple chunks of cache,
>>>displacing something else.
>>
>>thanks for the explanation - that cache thing makes sense!
>
>Also Gerd's instruction length point was important.  I had simply ignored
>that since I wasn't actually writing assembly code at the time when I
>answered.
>
>
>
>>
>>cheers
>> martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.