Author: Gerd Isenberg
Date: 10:20:18 01/21/04
Hi, all interested in bitboards and SIMD-instructions ;-)
With bitboards the usual, obvious way to get the piece/gain of a (single)
capture target, is bitscan and two lookups:
square = bitScan(t);
piece = board[square];
gain = pieceValue[piece];
If you (like me) intend to avoid square-centric at all (bitscan and incremental
updated board array), you may either traverse disjoint target sets, or you may
do something like this:
// precondition: t is single member of opposite pieces (but no king)
if ( t & Pawns ) {piece = pawn; gain = ValPawn;}
else if ( t & Rooks ) {piece = rook; gain = ValRook;}
else if ( t & Knights ){piece = knight; gain = ValKnight;}
else if ( t & Bishops ){piece = bishop; gain = ValBishop;}
else /* queen */ {piece = queen; gain = ValQueen;}
For this purpose or similar branch structures, i figured out following
SSE2-"algo" based on "parallel compare", "parallel and" in conjunction with
"parallel Sum of Absolute Differences", i like to introduce here:
----------------------------------------------------------------------------
For all targets (eg. once per capture generation):
To perform a single 32-bit compare between masked high and low-board,
it is favorably to have four piece-bitboards
in a high/low separated (xmm-register) representation:
xmm6 ::= <ql,rl:bl,kl> - LowBoard of queens,rooks,bishops and knights
xmm7 ::= <qh,rh:bh,kh> - HighBoard of queens,rooks,bishops and knights
Some piece related 16-bit constants (see below):
xmm4 ::= <MQ16,MR16,MB16,MK16: 4,3,2,1> // 16-bit word wise
xmm5 ::= <SQ16,SR16,SB16,SK16: 4,3,2,1>
----------------------------------------------------------------------------
For each capture target:
xmm0 ::= <0:target> == <0,0:th,tl> - single target bit
// unpack the target
PUNPCKLDQ xmm0, xmm0 ; <th,th:tl,tl>
MOVDQA xmm1, xmm0
PUNPCKLQDQ xmm0, xmm0 ; <tl,tl:tl,tl>
PUNPCKHQDQ xmm1, xmm1 ; <th,th:th,th>
// perform the ands with piece sets
PAND xmm0, xmm6 ; <tl,tl:tl,tl> &= <ql,rl:bl,kl>
PAND xmm1, xmm7 ; <th,th:th,th> &= <qh,rh:bh,kh>
// compare high-low, which is equal for (three or four) empty sets
PCMPEQD xmm0, xmm1 ; <!Q32,!R32:!B32,!K32>
PACKSSDW xmm0, xmm0 ; <!Q16,!R16,!B16,!K16:!Q16,!R16,!B16,!K16>
// mask and sum absolute byte difference to get the final values
PAND xmm0, xmm4 ; piece specific mask
PSADBW xmm0, xmm5 ; < gain/2 : pieceCode >
----------------------------------------------------------------------------
The <4,3,2,1> constants in the lower quad word of xmm4,xmm5
leave the following codes:
0 - Pawn, 1 - Knight, 2 - Bishop, 3 - Rook, 4 - Queen
To get the (half of) value of a piece, it is necessary to have some other
constants, to performe the "and" and "Packed Sum of Absolute Differences of
Bytes" trick:
Mi16 = 256*Mi8 + Mi8; a sequence of two bytes
Si16 = 256*Si8 + Si8; a sequence of two bytes
Mi8 = (2*ValuePiece[i] - ValuePawn + 3) div 8
Si8 = (4*ValuePiece[i] - 3*ValuePawn + 7) div 16
Because the max of absolute byte difference is 255, and "only" two byte
differences are added per piece, the maximum value is 510, therefore gain/2 is
computed, with value of a queen <= 1020.
Similar code is possible to consider SEE-like values for that target,
Some further comments on the instructions: Unfortunately there is one 4-cycle
vector path instruction on AMD64, PACKSSDW. I found no fast replacement for
that. All other instructions are "double direct path" instructions, four bytes
each. PSADBW takes 4 cycles, all other 2 cycles.
The unpack sequence to build the <th,th:th,th> <tl,tl:tl,tl> vector may be
computed with two PSHUFD instead of four unpack/mov-instructions, but PSHUFD is
4-cycle vector path too.
Comments, suggestions and improvements welcome of course.
Cheers,
Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.