Author: Sune Fischer
Date: 06:35:49 02/03/02
Go up one level in this thread
On February 03, 2002 at 04:06:54, Wylie Garvin wrote:
>Hi Sune (and any else),
>
> I'm going to have another shot at explaining the attack generation technique
>I use on x86. Note that I still haven't had a full version of this running to
>test the speed of it yet; my version is written in assembly. I have no idea if
>it performs well when expressed in C, but it's probably at least as fast as any
>of the other techniques and requires only a few kilobytes of table data.
>
> First, imagine you have these C declarations:
>
>enum { SLIDE_E,SLIDE_SE,SLIDE_S,SLIDE_SW,SLIDE_W,SLIDE_NW,SLIDE_N,SLIDE_NE };
>typedef unsigned long long BB;
>
>BB rayTable[64*8]; // 4096 bytes
>
>unsigned LoBit(BB bb); // i.e. like the BSF instruction, but for 64 bits.
>unsigned HiBit(BB bb); // i.e. like the BSR instruction, but for 64 bits.
>// assume that LoBit and HiBit return garbage if (!bb).
>
> For this discussion, assume square H1 is 0, G1 is 1, .. B1 is 6, A1 is 7, H2
>is 8, ... H7 is 48, .. H8 is 56, .. A8 is 63. In that case, the offset you add
>to a square number to move in the corresponding direction (assuming you aren't
>at the appropriate edge of the board already!) is:
> Direction Offset Direction Offset
> E -1 W +1
> SE -9 NW +9
> S -8 N +8
> SW -7 NE +7
>You should open up WinBoard and imagine my numbering scheme and verify these
>offsets yourself. Then, imagine you have a table with 64*8 "ray masks", one for
>each origin square and direction. Now suppose the direction was E,SE,S or SW.
>Then you would do something like this:
> BB rayMask = rayTable[square*8+SLIDE_NE];
> BB allPieces = ...;
> unsigned rayEnd = LoBit(rayMask & allPieces);
>For a ray direction like SLIDE_NE, which has a positive offset, LoBit is
>appropriate. Since the source bit is clear, and all the bits below it are not
>on the ray, and all the bits *not on the ray* are clear, the result of the LoBit
>operation will be the square number of the *first piece* you bump into as you
>slide along the ray. If the ray had a negative offset, you would use HiBit
>instead.
>
>Now, the second part is, if a ray mask would otherwise be empty, set the source
>square's bit in it. This guarantees there will be something for LoBit/HiBit to
>find on the "ray". The source square must NOT be included if there are any
>other squares on the ray though, or it will be hit every time.
>
>So, some C code to figure out what squares are attacked might look like:
>
>// disclaimer: all this code is not tested and not optimized!!
>
>// This is sort of like the assembly I use to choose captures to generate.
>BB scan_bq_rays(unsigned srcSquare, BB allPieces, BB enemyPieces) {
> BB result = 0L;
> result |= (1L<<(LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NE]));
> result |= (1L<<(LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NW]));
> result |= (1L<<(HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SE]));
> result |= (1L<<(HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SW]));
> return (result & enemyPieces);
>}
>BB scan_rq_rays(unsigned srcSquare, BB allPieces, BB enemyPieces) {
> // ..similarly..
>}
>
>// This is sort of like the assembly I use when generating non-captures.
>BB make_bq_mask(unsigned srcSquare, BB allPieces) {
> BB result = 0L;
> unsigned sqNe = LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NE]);
> unsigned sqSw = HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SW]);
> result |= (rayTable[sqNe*8+SLIDE_SW] & rayTable[sqSw*8+SLIDE_NE];
> unsigned sqNw = LoBit(allPieces & rayTable[srcSquare*8+SLIDE_NW]);
> unsigned sqSe = HiBit(allPieces & rayTable[srcSquare*8+SLIDE_SE]);
> result |= (rayTable[sqNw*8+SLIDE_SE] & rayTable[sqSe*8+SLIDE_NW];
> return (result & ~allPieces);
>}
>BB make_rq_mask(unsigned srcSquare, BB allPieces) {
> // ..similarly..
>}
>
>That all should be easier to understand than the first half of my message
>before. :)
>
>As to the second half, here's what that was getting at:
> (1) Suppose, instead of LoBit() and HiBit() operations, you had LoBit32() and
>HiBit32() operations that worked on a 32-bit unsigned. Then you might end up
>doing stuff more like this:
>
>enum {
> SLIDE_E0, SLIDE_E1, SLIDE_SE0, SLIDE_SE1,
> SLIDE_S0, SLIDE_S1, SLIDE_SW0, SLIDE_SW1,
> SLIDE_W0, SLIDE_W1, SLIDE_NW0, SLIDE_NW1,
> SLIDE_N0, SLIDE_N1, SLIDE_NE0, SLIDE_NE1
>};
>typedef unsigned long long BB;
>
>unsigned rayTable[64*16]; // 4096 bytes
>
>BB scan_bq_rays(unsigned srcSq, BB allP, BB enemyP) {
> BB result;
> result.lo = result.hi = 0;
> if (srcSq < 32) {
> // srcSq is in lo half --> HiBit32 never reaches hi half.
> result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SE0]));
> result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SW0]));
> unsigned temp = allP & rayTable[srcSq*16 + SLIDE_NE0];
> if (temp) {
> result.lo |= (1 << (LoBit32(temp));
> } else {
> result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NE1]));
> }
> temp = allP & rayTable[srcSq*16 + SLIDE_NW0];
> if (temp) {
> result.lo |= (1 << (LoBit32(temp));
> } else {
> result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NW1]));
> }
> } else {
> // srcSq is in hi half --> LoBit32 never reaches lo half.
> result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NE1]));
> result.hi |= (1 << (LoBit32(allP & rayTable[srcSq*16+SLIDE_NW1]));
> unsigned temp = allP & rayTable[srcSq*16 + SLIDE_SE1];
> if (temp) {
> result.hi |= (1 << (HiBit32(temp));
> } else {
> result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SE0]));
> }
> temp = allP & rayTable[srcSq*16 + SLIDE_SW1];
> if (temp) {
> result.hi |= (1 << (HiBit32(temp));
> } else {
> result.lo |= (1 << (HiBit32(allP & rayTable[srcSq*16+SLIDE_SW0]));
> }
> }
> result.lo &= enemyP.lo;
> result.hi &= enemyP.hi;
> return result;
>}
>
>So that code is sort of analogous to what my assembly code actually does.
>However, it uses the BSF/BSR instructions in place of LoBit32/HiBit32, and it
>keeps the result board in two 32-bit registers and uses the BTS instruction to
>turn on individual bits, instead of stuff like "result.lo |= (1 << x)".
>
>(2) Now, I will try and explain how to shrink the tables from 4K to 2.75K
>without slowing that code down at all...
> First, observe that if the srcSq is in the lo half of the board, then only
>six of the eight dwords for that square (for diagonal rays) in rayTable are
>used. Both halves of the NE and NW rays are used, but only the low half of the
>SE and SW rays are used. If you implement the similar scan_rq_rays and
>make_rq_mask functions, then you will find that the N ray uses both halves,
>while the S, W and E rays only use the lo half. So in total, only 11 of the 16
>bitboard halves from rayTable for a particular square in the lo half of the
>board are used. The opposite situation exists for source squares in the hi half
>of the board, and again 11 halves are used. Suppose we took the constants given
>above and made two sets of constants, one for the lo half of the board and one
>for the hi half:
>enum {
> SLIDE_LE0,(SLIDE_LE1), SLIDE_LSE0,(SLIDE_LSE1),
> SLIDE_LS0,(SLIDE_LS1), SLIDE_LSW0,(SLIDE_LSW1),
> SLIDE_LW0,(SLIDE_LW1), SLIDE_LNW0, SLIDE_LNW1,
> SLIDE_LN0, SLIDE_LN1, SLIDE_LNE0, SLIDE_LNE1,
>
> (SLIDE_HE0),SLIDE_HE1, SLIDE_HSE0, SLIDE_HSE1,
> SLIDE_HS0, SLIDE_HS1, SLIDE_HSW0, SLIDE_HSW1,
> (SLIDE_HW0),SLIDE_HW1, (SLIDE_HNW0),SLIDE_HNW1,
> (SLIDE_HN0),SLIDE_HN1, (SLIDE_HNE0),SLIDE_HNE1
>};
>Notice that I have put brackets around 10 of these 32 constants. These are the
>ones that are NEVER USED in the code. In effect, the code knows that these
>bitboard halves in the rayTable are zero for every square on the proper half of
>the board. So you can re-number them and make a new enum:
>enum {
> SLIDE_LE0 =0, SLIDE_HE1 =0,
> SLIDE_LW0 =1, SLIDE_HW1 =1,
> SLIDE_LN0 =2, SLIDE_HS0 =2,
> SLIDE_LN1 =3, SLIDE_HS1 =3,
> SLIDE_LS0 =4, SLIDE_HN1 =4,
> SLIDE_LSE0=5, SLIDE_NE1 =5,
> SLIDE_LSW0=6, SLIDE_NW1 =6,
> SLIDE_LNW0=7, SLIDE_SE0 =7,
> SLIDE_LNW1=8, SLIDE_SE1 =8,
> SLIDE_LNE0=9, SLIDE_SW0 =9,
> SLIDE_LNE1=10, SLIDE_SW1 =10
>};
>
> That isn't exactly the numbering I use; it doesn't really matter what
>numbering you use as long as it matches the contents of your 2816-byte rayTable.
>
> So, instead of using the *common sub-expression* (srcSq*16) in the rayTable
>index, you would use (srcSq*11). A non-stupid compiler will of course calculate
>this value ahead of time and re-use it; if you want to be sure, calculate it
>into a temporary and use the temporary throughout the function.
>
>hope some of this is useful,
>Wylie
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.