Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Wow, thanks, I'll look it over :) [NT]

Author: Sune Fischer
Date: 06:35:49 02/03/02
On February 03, 2002 at 04:06:54, Wylie Garvin wrote:

>Hi Sune (and any else),
>
>   I'm going to have another shot at explaining the attack generation technique
>I use on x86.  Note that I still haven't had a full version of this running to
>test the speed of it yet; my version is written in assembly.  I have no idea if
>it performs well when expressed in C, but it's probably at least as fast as any
>of the other techniques and requires only a few kilobytes of table data.
>
>   First, imagine you have these C declarations:
>
>enum { SLIDE_E,SLIDE_SE,SLIDE_S,SLIDE_SW,SLIDE_W,SLIDE_NW,SLIDE_N,SLIDE_NE };
>typedef unsigned long long BB;
>
>BB rayTable[64*8];  // 4096 bytes
>
>unsigned LoBit(BB bb);  // i.e. like the BSF instruction, but for 64 bits.
>unsigned HiBit(BB bb);  // i.e. like the BSR instruction, but for 64 bits.
>// assume that LoBit and HiBit return garbage if (!bb).
>
>    For this discussion, assume square H1 is 0, G1 is 1, .. B1 is 6, A1 is 7, H2
>is 8, ... H7 is 48, .. H8 is 56, .. A8 is 63.  In that case, the offset you add
>to a square number to move in the corresponding direction (assuming you aren't
>at the appropriate edge of the board already!) is:
>    Direction   Offset     Direction   Offset
>        E         -1           W         +1
>       SE         -9          NW         +9
>        S         -8           N         +8
>       SW         -7          NE         +7
>You should open up WinBoard and imagine my numbering scheme and verify these
>offsets yourself.  Then, imagine you have a table with 64*8 "ray masks", one for
>each origin square and direction.  Now suppose the direction was E,SE,S or SW.
>Then you would do something like this:
>    BB rayMask = rayTable[square*8+SLIDE_NE];
>    BB allPieces = ...;
>    unsigned rayEnd = LoBit(rayMask & allPieces);
>For a ray direction like SLIDE_NE, which has a positive offset, LoBit is
>appropriate.  Since the source bit is clear, and all the bits below it are not
>on the ray, and all the bits *not on the ray* are clear, the result of the LoBit
>operation will be the square number of the *first piece* you bump into as you
>slide along the ray.  If the ray had a negative offset, you would use HiBit
>instead.
>
>Now, the second part is, if a ray mask would otherwise be empty, set the source
>square's bit in it.  This guarantees there will be something for LoBit/HiBit to
>find on the "ray".  The source square must NOT be included if there are any
>other squares on the ray though, or it will be hit every time.
>
>So, some C code to figure out what squares are attacked might look like:
>
>// disclaimer: all this code is not tested and not optimized!!
>
>// This is sort of like the assembly I use to choose captures to generate.
>BB scan_bq_rays(unsigned srcSquare, BB allPieces, BB enemyPieces)  {
>    BB result = 0L;
>    result |= (1L<<(LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NE]));
>    result |= (1L<<(LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NW]));
>    result |= (1L<<(HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SE]));
>    result |= (1L<<(HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SW]));
>    return (result & enemyPieces);
>}
>BB scan_rq_rays(unsigned srcSquare, BB allPieces, BB enemyPieces)  {
>    // ..similarly..
>}
>
>// This is sort of like the assembly I use when generating non-captures.
>BB make_bq_mask(unsigned srcSquare, BB allPieces)  {
>    BB result = 0L;
>    unsigned sqNe = LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NE]);
>    unsigned sqSw = HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SW]);
>    result |= (rayTable[sqNe*8+SLIDE_SW] & rayTable[sqSw*8+SLIDE_NE];
>    unsigned sqNw = LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NW]);
>    unsigned sqSe = HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SE]);
>    result |= (rayTable[sqNw*8+SLIDE_SE] & rayTable[sqSe*8+SLIDE_NW];
>    return (result & ~allPieces);
>}
>BB make_rq_mask(unsigned srcSquare, BB allPieces)  {
>    // ..similarly..
>}
>
>That all should be easier to understand than the first half of my message
>before.  :)
>
>As to the second half, here's what that was getting at:
>  (1) Suppose, instead of LoBit() and HiBit() operations, you had LoBit32() and
>HiBit32() operations that worked on a 32-bit unsigned.  Then you might end up
>doing stuff more like this:
>
>enum {
>    SLIDE_E0, SLIDE_E1,    SLIDE_SE0, SLIDE_SE1,
>    SLIDE_S0, SLIDE_S1,    SLIDE_SW0, SLIDE_SW1,
>    SLIDE_W0, SLIDE_W1,    SLIDE_NW0, SLIDE_NW1,
>    SLIDE_N0, SLIDE_N1,    SLIDE_NE0, SLIDE_NE1
>};
>typedef unsigned long long BB;
>
>unsigned rayTable[64*16];  // 4096 bytes
>
>BB scan_bq_rays(unsigned srcSq, BB allP, BB enemyP)  {
>    BB result;
>    result.lo = result.hi = 0;
>    if (srcSq < 32)  {
>        // srcSq is in lo half --> HiBit32 never reaches hi half.
>        result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SE0]));
>        result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SW0]));
>        unsigned temp = allP & rayTable[srcSq*16 + SLIDE_NE0];
>        if (temp)  {
>            result.lo |= (1 << (LoBit32(temp));
>        } else  {
>            result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NE1]));
>        }
>        temp = allP & rayTable[srcSq*16 + SLIDE_NW0];
>        if (temp)  {
>            result.lo |= (1 << (LoBit32(temp));
>        } else  {
>            result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NW1]));
>        }
>    } else  {
>        // srcSq is in hi half --> LoBit32 never reaches lo half.
>        result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NE1]));
>        result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NW1]));
>        unsigned temp = allP & rayTable[srcSq*16 + SLIDE_SE1];
>        if (temp)  {
>            result.hi |= (1 << (HiBit32(temp));
>        } else  {
>            result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SE0]));
>        }
>        temp = allP & rayTable[srcSq*16 + SLIDE_SW1];
>        if (temp)  {
>            result.hi |= (1 << (HiBit32(temp));
>        } else  {
>            result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SW0]));
>        }
>    }
>    result.lo &= enemyP.lo;
>    result.hi &= enemyP.hi;
>    return result;
>}
>
>So that code is sort of analogous to what my assembly code actually does.
>However, it uses the BSF/BSR instructions in place of LoBit32/HiBit32, and it
>keeps the result board in two 32-bit registers and uses the BTS instruction to
>turn on individual bits, instead of stuff like "result.lo |= (1 << x)".
>
>(2) Now, I will try and explain how to shrink the tables from 4K to 2.75K
>without slowing that code down at all...
>    First, observe that if the srcSq is in the lo half of the board, then only
>six of the eight dwords for that square (for diagonal rays) in rayTable are
>used.  Both halves of the NE and NW rays are used, but only the low half of the
>SE and SW rays are used.  If you implement the similar scan_rq_rays and
>make_rq_mask functions, then you will find that the N ray uses both halves,
>while the S, W and E rays only use the lo half.  So in total, only 11 of the 16
>bitboard halves from rayTable for a particular square in the lo half of the
>board are used.  The opposite situation exists for source squares in the hi half
>of the board, and again 11 halves are used.  Suppose we took the constants given
>above and made two sets of constants, one for the lo half of the board and one
>for the hi half:
>enum {
>    SLIDE_LE0,(SLIDE_LE1),   SLIDE_LSE0,(SLIDE_LSE1),
>    SLIDE_LS0,(SLIDE_LS1),   SLIDE_LSW0,(SLIDE_LSW1),
>    SLIDE_LW0,(SLIDE_LW1),   SLIDE_LNW0, SLIDE_LNW1,
>    SLIDE_LN0, SLIDE_LN1,    SLIDE_LNE0, SLIDE_LNE1,
>
>   (SLIDE_HE0),SLIDE_HE1,    SLIDE_HSE0, SLIDE_HSE1,
>    SLIDE_HS0, SLIDE_HS1,    SLIDE_HSW0, SLIDE_HSW1,
>   (SLIDE_HW0),SLIDE_HW1,   (SLIDE_HNW0),SLIDE_HNW1,
>   (SLIDE_HN0),SLIDE_HN1,   (SLIDE_HNE0),SLIDE_HNE1
>};
>Notice that I have put brackets around 10 of these 32 constants.  These are the
>ones that are NEVER USED in the code.  In effect, the code knows that these
>bitboard halves in the rayTable are zero for every square on the proper half of
>the board.  So you can re-number them and make a new enum:
>enum {
>    SLIDE_LE0 =0,  SLIDE_HE1 =0,
>    SLIDE_LW0 =1,  SLIDE_HW1 =1,
>    SLIDE_LN0 =2,  SLIDE_HS0 =2,
>    SLIDE_LN1 =3,  SLIDE_HS1 =3,
>    SLIDE_LS0 =4,  SLIDE_HN1 =4,
>    SLIDE_LSE0=5,  SLIDE_NE1 =5,
>    SLIDE_LSW0=6,  SLIDE_NW1 =6,
>    SLIDE_LNW0=7,  SLIDE_SE0 =7,
>    SLIDE_LNW1=8,  SLIDE_SE1 =8,
>    SLIDE_LNE0=9,  SLIDE_SW0 =9,
>    SLIDE_LNE1=10, SLIDE_SW1 =10
>};
>
>   That isn't exactly the numbering I use; it doesn't really matter what
>numbering you use as long as it matches the contents of your 2816-byte rayTable.
>
>   So, instead of using the *common sub-expression* (srcSq*16) in the rayTable
>index, you would use (srcSq*11).  A non-stupid compiler will of course calculate
>this value ahead of time and re-use it; if you want to be sure, calculate it
>into a temporary and use the temporary throughout the function.
>
>hope some of this is useful,
>Wylie
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.