Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: about SEE of Crafty

Author: Robert Hyatt
Date: 09:33:33 01/06/04
On January 06, 2004 at 12:18:26, martin fierz wrote:

>On January 06, 2004 at 12:09:05, Robert Hyatt wrote:
>
>>On January 06, 2004 at 11:57:01, martin fierz wrote:
>>
>>>On January 06, 2004 at 11:49:32, Robert Hyatt wrote:
>>>
>>>>On January 06, 2004 at 11:41:59, martin fierz wrote:
>>>>
>>>>>On January 06, 2004 at 11:35:42, Gerd Isenberg wrote:
>>>>>
>>>>>>On January 06, 2004 at 10:56:00, martin fierz wrote:
>>>>>>
>>>>>>>On January 06, 2004 at 09:40:33, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>Maybe there is some confusion here.  There is _no_ cost for "allocating"
>>>>>>>>local variables in a function.  And I _do_ mean _NO_.  These values are
>>>>>>>>simply "on the stack" and when the function is entered, the stack pointer has
>>>>>>>>some constant subtracted from it to leave a "hole" that is used to hold local
>>>>>>>>function variables.  no malloc() or anything like that is done by modern
>>>>>>>>compilers, hence zero cost.
>>>>>>>
>>>>>>>hi bob,
>>>>>>>
>>>>>>>is this really true? i used to have a HUGE structure describing a move in the
>>>>>>>first version of my chess program (don't ask, please...). in my negamax function
>>>>>>>i had this
>>>>>>>
>>>>>>>int negamax(int alpha, int beta, int depth)
>>>>>>>   {
>>>>>>>   MOVE movelist[MAXMOVES];
>>>>>>>   ...other variables...
>>>>>>>
>>>>>>>   ...typical negamax code...
>>>>>>>
>>>>>>>   }
>>>>>>>
>>>>>>>i saw clear speed differences depending on the value i used for MAXMOVES - and
>>>>>>>in my tests, i made sure that using 100 would never overflow in the positions i
>>>>>>>was searching, then i set it to e.g. 200 and the program would get a few %
>>>>>>>slower, although these additional 100 array elements were never used.
>>>>>>>
>>>>>>>how can this be explained? i'm not memsetting the array to 0 or anything.
>>>>>>
>>>>>>Hi Martin,
>>>>>>
>>>>>>that was my point in my reply to Bob - instruction length.
>>>>>>
>>>>>>Did you expect the assembly,
>>>>>
>>>>>hi gerd,
>>>>>
>>>>>for heaven's sake no! to me, __asm is about equivalent to chinese, i'm not an
>>>>>expert like you :-)
>>>>>
>>>>>>specially the bp-relative offsets of the other
>>>>>>locals load/stores? In C, per default all locals of a function on the stack,
>>>>>>including actual parameters are addressed relative to the base pointer register.
>>>>>>As long those offsets are in a signed byte range (-128..+127), with X86-32 only
>>>>>>one byte (to sign extend) is necessary, otherwise the complete 32-bit offset is
>>>>>>needed.
>>>>>
>>>>>...and this sounds a bit like chinese too! forgive me for my stupidity in this
>>>>>area! are you saying that if the storage of my locals exceeds X bytes (X=256?),
>>>>>then my function will get slower? and that the execution speed of my function
>>>>>would depend on the storage amount in a step-like way: locals need less than X
>>>>>bytes => fast; else => slow?
>>>>>because that is not what i saw. i didn't measure exactly, but the larger i made
>>>>>MAXMOVES, the slower the program got.
>>>>
>>>>Once you go beyond 128 bytes, you get into the problem Gerd mentioned.  The
>>>>problem is that we need to use register + displacement + index.  The
>>>>displacement can be 1 byte or 4, which makes the instruction longer, and hence
>>>>slower.  I had not considered that and it _does_ make a difference.  But once
>>>>you blow 128 bytes, it should not get worse, and the difference is not going to
>>>>be huge (ie here huge is 1%) anyway...
>>>
>>>i see. so that means i should rethink all my large data structures :-)
>>>BTW, do you just have a huge movelist with many 100 moves as maximum in crafty
>>>to handle positions with *many* moves - these crazy ones that people sometimes
>>>construct? because i noticed my program slowing down when i increased the
>>>MAXMOVES variable, i'm probably safe for normal positions, but certainly not for
>>>the crazy ones. what number is considered safe for even crazy positions?
>>
>>what I have is one big moves[n] array.  And a last[ply] array.  last[ply]
>>always points to 1 beyond the last move generated, so that at ply+1 I start
>>the move list at last[ply-1] and when I am done, I remember the last[ply] as
>>pointing to the next unused entry in the move list.  This way my movelist is
>>compact for positions where there are not many moves, which helps with the
>>cache-line aliasing problem.  If you make it a 2d array, something like
>>moves[64][220] (64 plies, 220 moves per ply as we know that there is a position
>>with 218 legal moves) then if you don't use all of that 220 for a ply, you
>>have "gaps" in how that maps into cache that is not very efficient, since
>>on average, you will always have one cache line that is 1/2 unused.  My approach
>>packs the moves together into contiguous virtual memory, which packs them
>>together in cache as well as can be done knowing that cache uses physical
>>memory addresses for mapping, not virtual.
>
>nice trick - thanks for the explanation! i'll have to try in my program :-)
>
>cheers
>  martin
>

It's been around forever.  I always did it like this, in fact, all the way
back to the 1970's at least and maybe earlier (hard to remember what I did
in 1968 of course.  :)  )

>>
>>>
>>>>But as I mentioned previously, you are pushing the stack size out, and that
>>>>will have perhaps unexpected effects on cache.  A global will always end up
>>>>in the same place in memory and take up one chunk of cache.  Locals will
>>>>stack up if you are talking recursion, and take up multiple chunks of cache,
>>>>displacing something else.
>>>
>>>thanks for the explanation - that cache thing makes sense!
>>
>>Also Gerd's instruction length point was important.  I had simply ignored
>>that since I wasn't actually writing assembly code at the time when I
>>answered.
>>
>>
>>
>>>
>>>cheers
>>> martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.