Computer Chess Club Archives


Search

Terms

Messages

Subject: Ed Schröder's attack bytes or SEE with SSE2?

Author: Gerd Isenberg

Date: 16:35:04 08/11/04


Hi all,

Ed's attack byte idea has the merit to delegate register intensive computing to
pre-calculated table lookup:

char TABLE [12] [256] [256];

With own and opposite attack bytes and target content (piece/empty square in
case of capture/move) to index that array and to get some SEE-like value.

http://members.home.nl/matador/chess840.htm#HW
Evaluation in REBEL (hanging pieces)

----------------------- bitboard centric on ----------------------------

Unfortunately this nice idea don't fits really good my bitboard centric
paradigms ;-)

Because i like to avoid any square indices like from or to, but only sets and
"magic" move numbers.

For instance a set of opposite pawn attacks to empty squares (or defending own
pawns) is a subset of a taboo set of king, queen, rook and light pieces. Some
"attack count" bitboards may easily be computed, at least or exclusive 1,2,3(,4)
attacks, as well as several taboo-sets for king and pieces.

Combining these bitboards with some boolean bitwise expressions detects most
hanging or en prised pieces. SEE like gain-sets e.g. direct rook attack to a
light piece, supported by another rook or more expensive piece (even due to
battery). If the light piece is defended, but only once and not by a pawn, it
becomes member of the en-prise set, rook vs two pieces.

That kind of parallel SEE might be done during huge eval or after trying obvious
winning captures which require less effort like pxQ and are likely to produce a
cutoff. The computation for several sets may be performed in several independent
instruction chains to get some parallel speedup of four or better.

----------------------- bitboard centric off ---------------------------

The SSE2 dot product routines gave me a thought about to get all 64 board attack
bytes on the fly, SIMD-wise from a set of (up to) ten attack bitboards per side.

My suggested byte format is a bit different from Ed's approach.
The intention is the same, to address TABLE and get precalculated SEE values for
all move and capture targets:

Ed's format:

+------+------+------+------+------+------+------+------+  B0-B2 : 2 attackers
| BIT0 | BIT1 | BIT2 | BIT3 | BIT4 | BIT5 | BIT6 | BIT7 |
+------+------+------+------+------+------+------+------+  B3    : white pawn
|      Number of     | PAWN |KNIGHT| ROOK | QUEEN| KING |
|      ATTACKERS     |      |BISHOP|      |      |      |  B4    : white
knight/bishop
+------+------+------+------+------+------+------+------+
|  0   |   1  |  0   |  1   |  1   |  0   |  0   |  0   |
+------+------+------+------+------+------+------+------+


Due to easier SSE2-implementation my suggestion is:
                               -> direction of overflow!
+------+------+------+------+------+------+------+------+
| BIT0 | BIT1 | BIT2 | BIT3 | BIT4 | BIT5 | BIT6 | BIT7 |
+------+------+------+------+------+------+------+------+
|      Number of     |QUEEN |    ROOK #of light  | PAWN |
|      ATTACKERS     |      |      | ATTACKERS   |      |
|   (incl one xray)  |  ATTACKER   |             |      |
+------+------+------+------+------+------+------+------+
|  0   |   1  |  0   |  1   |  1   |  0   |  0   |  0   |
+------+------+------+------+------+------+------+------+

The reason is to add bytewise increments with saturation of 0xff:

0x81 for pawn,
0x21 for bishops and knights,
0x11 for rooks
0x09 for queen
0x01 for king and probably for a set of xray attacks.

Possible drawbacks:

If two pawns attack a single square, an overflow occurs - due to saturation the
result is 255. Further adds don't change that and there is no more
differentiation - 255 means at least two pawns. The interpretation of the three
LSBs as total attack counters is wrong then (7 vs >=2), one should simply not
interprete the values that way - only the whole byte as index.

There are additional possible field overflows, the most common if a square is
attacked by two rooks, there is an overflow to the two bit-light piece counter -
but the total number of attacker takes care of that.
There are even other rare cases where an overflow with saturation occurs, but i
guess one may safely ignore that, if the bytes are only about to index TABLE.

What is your optinion about that format, too ambiguous?

As input for the routine an array or a struct of ten bitboards per side:

struct CAttackParam
{
    BitBoard    pawnAttacksLeft;
    BitBoard    pawnAttacksRight;
    BitBoard    bishopAttacks;
    BitBoard    knight1Attacks;
    BitBoard    knight2Attacks;
    BitBoard    rook1Attacks;
    BitBoard    rook2Attacks;
    BitBoard    queenAttacks;
    BitBoard    kingAttacks;
    BitBoard    xRays;
};

Like dot product the routine does a 64->512 bit expansion inside four xmm
registers, sign extending all 64 bits to 64 bytes.

The routines loops over ten bitboards and associated increments and takes 135ns
best case on my 2.2GHz AMD64, ~300 cycles. Unrolling may safe some cycles - or
not. Hmm..., it is needed twice per node. 600 cycles to get all indices for
white and black.

Producing the rotated file/rank rotated array is cheaper.
May be one may implement a final 90 degree rotation.
May be it becomes more interesting if it, let say takes only
150 cycles for both sides ;-)

OTOH an incremental approach takes also some cycles and has some branches and
miss-predictions here and there - and incremental update must be considered
every make and unmake move, while on the fly is often safed in cheap or lazy
nodes.

Cheers,
Gerd


/* 16 bit alignment required */

void getBoardAttackTable(unsigned char batable[], const BitBoard someAttacks[])
{
    static const BitBoard CACHE_ALIGN consts[4] =
    {
        0x8040201008040201, 0x8040201008040201,
                         0,                  0
    };
    static const BitBoard incs[2*10] =
    {
        0x8181818181818181, 0x8181818181818181, // pawn
        0x8181818181818181, 0x8181818181818181,
        0x2121212121212121, 0x2121212121212121, // light piece
        0x2121212121212121, 0x2121212121212121,
        0x2121212121212121, 0x2121212121212121,
        0x1111111111111111, 0x1111111111111111, // rook
        0x1111111111111111, 0x1111111111111111,
        0x0909090909090909, 0x0909090909090909,  // queen
        0x0101010101010101, 0x0101010101010101,  // king
        0x0101010101010101, 0x0101010101010101,  // some indirect attack
    };

    __asm
    {
        mov     eax, [someAttacks]
        lea     ebx, [incs]
        lea     edx, [consts]
        pxor    xmm4, xmm4
        pxor    xmm5, xmm5
        pxor    xmm6, xmm6
        pxor    xmm7, xmm7     ; xmm7...xmm4 result := 0
        lea     esi, [edx+16]  ; &null
        mov     ecx, 10        ; for 10 bitboards
nextset:
        movq    xmm0, qword ptr [eax] ; get attack bitboard
        add     eax, 8         ; ++ptr2bb
        punpcklbw  xmm0, xmm0
        movdqa     xmm2, xmm0
        punpcklwd  xmm0, xmm0
        punpckhwd  xmm2, xmm2
        movdqa     xmm1, xmm0
        movdqa     xmm3, xmm2
        punpckldq  xmm0, xmm0
        punpckhdq  xmm1, xmm1
        punpckldq  xmm2, xmm2
        punpckhdq  xmm3, xmm3   ; 64->512 bit byte sign extension
        pandn   xmm0, [edx]     ; bits
        pandn   xmm1, [edx]
        pandn   xmm2, [edx]
        pandn   xmm3, [edx]
        pcmpeqb xmm0, [esi]     ; bits != null
        pcmpeqb xmm1, [esi]
        pcmpeqb xmm2, [esi]
        pcmpeqb xmm3, [esi]
        pand    xmm0, [ebx]     ; mask the piece specific summands
        pand    xmm1, [ebx]
        pand    xmm2, [ebx]
        pand    xmm3, [ebx]
        add     ebx, 16
        dec     ecx
        paddusb xmm4, xmm0      ; add 16 bytes with saturation
        paddusb xmm5, xmm1      ; four times
        paddusb xmm6, xmm2
        paddusb xmm7, xmm3
        jnz     nextset
        ; store the 64 bytes
        mov     eax, [batable]
        movdqa  [eax+0*16], xmm4
        movdqa  [eax+1*16], xmm5
        movdqa  [eax+2*16], xmm6
        movdqa  [eax+3*16], xmm7
    }
}



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.