Computer Chess Club Archives


Search

Terms

Messages

Subject: To Sune: BSR/BSF sliding attacks explained better (I hope!)

Author: Wylie Garvin

Date: 01:06:54 02/03/02


Hi Sune (and any else),

   I'm going to have another shot at explaining the attack generation technique
I use on x86.  Note that I still haven't had a full version of this running to
test the speed of it yet; my version is written in assembly.  I have no idea if
it performs well when expressed in C, but it's probably at least as fast as any
of the other techniques and requires only a few kilobytes of table data.

   First, imagine you have these C declarations:

enum { SLIDE_E,SLIDE_SE,SLIDE_S,SLIDE_SW,SLIDE_W,SLIDE_NW,SLIDE_N,SLIDE_NE };
typedef unsigned long long BB;

BB rayTable[64*8];  // 4096 bytes

unsigned LoBit(BB bb);  // i.e. like the BSF instruction, but for 64 bits.
unsigned HiBit(BB bb);  // i.e. like the BSR instruction, but for 64 bits.
// assume that LoBit and HiBit return garbage if (!bb).

    For this discussion, assume square H1 is 0, G1 is 1, .. B1 is 6, A1 is 7, H2
is 8, ... H7 is 48, .. H8 is 56, .. A8 is 63.  In that case, the offset you add
to a square number to move in the corresponding direction (assuming you aren't
at the appropriate edge of the board already!) is:
    Direction   Offset     Direction   Offset
        E         -1           W         +1
       SE         -9          NW         +9
        S         -8           N         +8
       SW         -7          NE         +7
You should open up WinBoard and imagine my numbering scheme and verify these
offsets yourself.  Then, imagine you have a table with 64*8 "ray masks", one for
each origin square and direction.  Now suppose the direction was E,SE,S or SW.
Then you would do something like this:
    BB rayMask = rayTable[square*8+SLIDE_NE];
    BB allPieces = ...;
    unsigned rayEnd = LoBit(rayMask & allPieces);
For a ray direction like SLIDE_NE, which has a positive offset, LoBit is
appropriate.  Since the source bit is clear, and all the bits below it are not
on the ray, and all the bits *not on the ray* are clear, the result of the LoBit
operation will be the square number of the *first piece* you bump into as you
slide along the ray.  If the ray had a negative offset, you would use HiBit
instead.

Now, the second part is, if a ray mask would otherwise be empty, set the source
square's bit in it.  This guarantees there will be something for LoBit/HiBit to
find on the "ray".  The source square must NOT be included if there are any
other squares on the ray though, or it will be hit every time.

So, some C code to figure out what squares are attacked might look like:

// disclaimer: all this code is not tested and not optimized!!

// This is sort of like the assembly I use to choose captures to generate.
BB scan_bq_rays(unsigned srcSquare, BB allPieces, BB enemyPieces)  {
    BB result = 0L;
    result |= (1L<<(LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NE]));
    result |= (1L<<(LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NW]));
    result |= (1L<<(HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SE]));
    result |= (1L<<(HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SW]));
    return (result & enemyPieces);
}
BB scan_rq_rays(unsigned srcSquare, BB allPieces, BB enemyPieces)  {
    // ..similarly..
}

// This is sort of like the assembly I use when generating non-captures.
BB make_bq_mask(unsigned srcSquare, BB allPieces)  {
    BB result = 0L;
    unsigned sqNe = LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NE]);
    unsigned sqSw = HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SW]);
    result |= (rayTable[sqNe*8+SLIDE_SW] & rayTable[sqSw*8+SLIDE_NE];
    unsigned sqNw = LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NW]);
    unsigned sqSe = HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SE]);
    result |= (rayTable[sqNw*8+SLIDE_SE] & rayTable[sqSe*8+SLIDE_NW];
    return (result & ~allPieces);
}
BB make_rq_mask(unsigned srcSquare, BB allPieces)  {
    // ..similarly..
}

That all should be easier to understand than the first half of my message
before.  :)

As to the second half, here's what that was getting at:
  (1) Suppose, instead of LoBit() and HiBit() operations, you had LoBit32() and
HiBit32() operations that worked on a 32-bit unsigned.  Then you might end up
doing stuff more like this:

enum {
    SLIDE_E0, SLIDE_E1,    SLIDE_SE0, SLIDE_SE1,
    SLIDE_S0, SLIDE_S1,    SLIDE_SW0, SLIDE_SW1,
    SLIDE_W0, SLIDE_W1,    SLIDE_NW0, SLIDE_NW1,
    SLIDE_N0, SLIDE_N1,    SLIDE_NE0, SLIDE_NE1
};
typedef unsigned long long BB;

unsigned rayTable[64*16];  // 4096 bytes

BB scan_bq_rays(unsigned srcSq, BB allP, BB enemyP)  {
    BB result;
    result.lo = result.hi = 0;
    if (srcSq < 32)  {
        // srcSq is in lo half --> HiBit32 never reaches hi half.
        result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SE0]));
        result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SW0]));
        unsigned temp = allP & rayTable[srcSq*16 + SLIDE_NE0];
        if (temp)  {
            result.lo |= (1 << (LoBit32(temp));
        } else  {
            result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NE1]));
        }
        temp = allP & rayTable[srcSq*16 + SLIDE_NW0];
        if (temp)  {
            result.lo |= (1 << (LoBit32(temp));
        } else  {
            result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NW1]));
        }
    } else  {
        // srcSq is in hi half --> LoBit32 never reaches lo half.
        result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NE1]));
        result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NW1]));
        unsigned temp = allP & rayTable[srcSq*16 + SLIDE_SE1];
        if (temp)  {
            result.hi |= (1 << (HiBit32(temp));
        } else  {
            result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SE0]));
        }
        temp = allP & rayTable[srcSq*16 + SLIDE_SW1];
        if (temp)  {
            result.hi |= (1 << (HiBit32(temp));
        } else  {
            result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SW0]));
        }
    }
    result.lo &= enemyP.lo;
    result.hi &= enemyP.hi;
    return result;
}

So that code is sort of analogous to what my assembly code actually does.
However, it uses the BSF/BSR instructions in place of LoBit32/HiBit32, and it
keeps the result board in two 32-bit registers and uses the BTS instruction to
turn on individual bits, instead of stuff like "result.lo |= (1 << x)".

(2) Now, I will try and explain how to shrink the tables from 4K to 2.75K
without slowing that code down at all...
    First, observe that if the srcSq is in the lo half of the board, then only
six of the eight dwords for that square (for diagonal rays) in rayTable are
used.  Both halves of the NE and NW rays are used, but only the low half of the
SE and SW rays are used.  If you implement the similar scan_rq_rays and
make_rq_mask functions, then you will find that the N ray uses both halves,
while the S, W and E rays only use the lo half.  So in total, only 11 of the 16
bitboard halves from rayTable for a particular square in the lo half of the
board are used.  The opposite situation exists for source squares in the hi half
of the board, and again 11 halves are used.  Suppose we took the constants given
above and made two sets of constants, one for the lo half of the board and one
for the hi half:
enum {
    SLIDE_LE0,(SLIDE_LE1),   SLIDE_LSE0,(SLIDE_LSE1),
    SLIDE_LS0,(SLIDE_LS1),   SLIDE_LSW0,(SLIDE_LSW1),
    SLIDE_LW0,(SLIDE_LW1),   SLIDE_LNW0, SLIDE_LNW1,
    SLIDE_LN0, SLIDE_LN1,    SLIDE_LNE0, SLIDE_LNE1,

   (SLIDE_HE0),SLIDE_HE1,    SLIDE_HSE0, SLIDE_HSE1,
    SLIDE_HS0, SLIDE_HS1,    SLIDE_HSW0, SLIDE_HSW1,
   (SLIDE_HW0),SLIDE_HW1,   (SLIDE_HNW0),SLIDE_HNW1,
   (SLIDE_HN0),SLIDE_HN1,   (SLIDE_HNE0),SLIDE_HNE1
};
Notice that I have put brackets around 10 of these 32 constants.  These are the
ones that are NEVER USED in the code.  In effect, the code knows that these
bitboard halves in the rayTable are zero for every square on the proper half of
the board.  So you can re-number them and make a new enum:
enum {
    SLIDE_LE0 =0,  SLIDE_HE1 =0,
    SLIDE_LW0 =1,  SLIDE_HW1 =1,
    SLIDE_LN0 =2,  SLIDE_HS0 =2,
    SLIDE_LN1 =3,  SLIDE_HS1 =3,
    SLIDE_LS0 =4,  SLIDE_HN1 =4,
    SLIDE_LSE0=5,  SLIDE_NE1 =5,
    SLIDE_LSW0=6,  SLIDE_NW1 =6,
    SLIDE_LNW0=7,  SLIDE_SE0 =7,
    SLIDE_LNW1=8,  SLIDE_SE1 =8,
    SLIDE_LNE0=9,  SLIDE_SW0 =9,
    SLIDE_LNE1=10, SLIDE_SW1 =10
};

   That isn't exactly the numbering I use; it doesn't really matter what
numbering you use as long as it matches the contents of your 2816-byte rayTable.

   So, instead of using the *common sub-expression* (srcSq*16) in the rayTable
index, you would use (srcSq*11).  A non-stupid compiler will of course calculate
this value ahead of time and re-use it; if you want to be sure, calculate it
into a temporary and use the temporary throughout the function.

hope some of this is useful,
Wylie



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.