Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: how do you detect patterns efficiently?

Author: martin fierz

Date: 13:05:14 02/05/02

Go up one level in this thread


On February 05, 2002 at 08:17:35, Heiner Marxen wrote:

>On February 04, 2002 at 21:42:49, martin fierz wrote:
>
>>aloha!
>
>Hi!
>
>>my question is related to checkers, but it would be the same in computer chess
>>with bitboards:
>>
>>i represent a board with 4 32-bit ints (checkers has 4 piece types, 32 squares).
>>in my evaluation i would like to add some patterns, which crop up every now and
>>then.
>
>Looking at your code below, which does not amount to hundreds of instructions,
>I would not be bothered, first.  Have you profiled your program?
>And found this to be a hot spot?

i have several patterns i want to detect. the more, the merrier :-)
i only included one pattern detection as an example.


>
>Ok, lets assume such code is needed several times, and we are concerned
>about its efficiency...
>
>
>> let's say, i want to find all positions with the property that a specific
>>subset of every bitboard is present:
>>p is a position structure with the elements bm, bk, wm, wk for black men, kings
>>and their white counterparts.
>>
>>my program looks like this:
>>
>>// version I
>>if( ((p.bm&pattern.bm)==pattern.bm) &&
>>    ((p.bm&pattern.bk)==pattern.bk) &&
>         ^^ bk
>>    ((p.bm&pattern.wm)==pattern.wm) &&
>         ^^ wm
>>    ((p.bm&pattern.wk)==pattern.wk) )
>         ^^ wk
>>    eval += value;
>
>Typos like this can be overlooked easily and may stay undetected for a long
>time.

yes, this is a typo of course, but only when entering the code in CCC - i
didnt copy->paste it. still, you are right about the macro of course. i was
also wondering if it makes any difference if i hardcode the pattern.xy in
the program like this:

if((p.bm&0x000000F0) == p.bm&0x0000000F0) && etc.

or if i use variables? would the hard-coded version be faster?


>One more reason to introduce an abstraction (e.g. a macro),
>the usage of which avoids to repeatedly write the same thing again,
>with increased chances of getting it wrong.  E.g.:
>
>#define match1(p,patt,k)		(((p).k) == ((patt).k))
>#define match4(p,patt,c1,c2,c3,c4)	(   match1(p,patt,c1)	\
>					 && match1(p,patt,c2)	\
>					 && match1(p,patt,c3)	\
>					 && match1(p,patt,c4)	)
>#define match(p,patt)	match4(p,patt,bm,bk,wm,wk)
>
>if( match(p,pattern) )
>    eval += value;
>
>Now you can try different implementations of match(), without changing
>your main code.  E.g. you can experiment, which of bm,bk,wm and wk should
>be tested first.
>
>Inline functions may also work well.
>
>
>>is this any good? i thought of an alternative, let's say,
>>my total pattern consists of 7 stones. so i define the array
>>
>>// version II
>>int seven[8] = {0,0,0,0,0,0,0,1};
>>and do
>>   x = p.bm&pattern.bm | p.bk&pattern.bk | p.wm&pattern.wm | p.wk&pattern.wk;
>>   eval += value*bitcount(x);
>
>I don't see "seven[]" to be used?!

oh, another typo of course.... sorry.... the idea was
to do

eval+= value*seven[bitcount(x)];

- which is a branch-free version of the above, and also detects the whole
pattern. the point is that the pattern consists of exactly seven squares
here, if it consisted of another number of squares, i would have to
use addidional arrays like six[] etc.


>Anyhow, this one differs from the others semantically, it counts partial
>matches, also.  This may or may not be a good idea, but should not be
>decided by implementation aspects, but rather by functional demand.
>
>
>>// version III
>>or, last alternative:
>>   x = p.bm&pattern.bm | p.bk&pattern.bk | p.wm&pattern.wm | p.wk&pattern.wk;
>>   if( x == pattern.bm|pattern.bk|pattern.wm|pattern.wk)
>>      eval+=value;
>>
>>any ideas on these? looking at them now, version 3 looks best to me. is there
>>any difference on different processors? e.g. version II needs no if, thus no
>>branches, and i think different processors have different penalties for branch
>>misprediction? or does anybody have a suggestion for a more efficient version to
>>solve this? real patterns usually include only 3 bitboards (black men, white
>>men, and unoccupied squares).
>>
>>any thoughts are appreciated :-)
>>  cheers
>>    martin
>
>The two important aspects here, which are expensive, are memory fetches
>and (conditional) branches.  Currently branches are even more expensive
>than memory fetches (IMO).  Of course, one should measure the alternatives.
>That should be done with the real application, and not with a toy program,
>since the overall cache usage will be important (in the toy program all
>memory fetches will be L1-cache hits and appear to be cheap).

this is why i'm asking - i can spend lots of time writing a toy program which
will not do the same thing as my real program. or i can use the profiler which i
don't believe - the profiling code messes up the cache too. so i'm looking for
opinions on which of the above three code versions is fastest in theory for
different processors - e.g. if a processor had a bitcount function, probably
approach 2 would win. if it has huge branch misprediction penalties, also
version 2 would win. else version 1 or 3 would win. is there a difference in
these 2?

cheers
  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.