Computer Chess Club Archives




Subject: A SIMD idea, eg. Piece/Gain of a capture target

Author: Gerd Isenberg

Date: 10:20:18 01/21/04

Hi, all interested in bitboards and SIMD-instructions ;-)

With bitboards the usual, obvious way to get the piece/gain of a (single)
capture target, is bitscan and two lookups:

   square = bitScan(t);
   piece  = board[square];
   gain   = pieceValue[piece];

If you (like me) intend to avoid square-centric at all (bitscan and incremental
updated board array), you may either traverse disjoint target sets, or you may
do something like this:

// precondition: t is single member of opposite pieces (but no king)

   if      ( t & Pawns )  {piece = pawn;    gain = ValPawn;}
   else if ( t & Rooks )  {piece = rook;    gain = ValRook;}
   else if ( t & Knights ){piece = knight;  gain = ValKnight;}
   else if ( t & Bishops ){piece = bishop;  gain = ValBishop;}
   else /* queen */       {piece = queen;   gain = ValQueen;}

For this purpose or similar branch structures, i figured out following
SSE2-"algo" based on "parallel compare", "parallel and" in conjunction with
"parallel Sum of Absolute Differences", i like to introduce here:

For all targets (eg. once per capture generation):

 To perform a single 32-bit compare between masked high and low-board,
 it is favorably to have four piece-bitboards
 in a high/low separated (xmm-register) representation:

 xmm6 ::= <ql,rl:bl,kl>  -  LowBoard of queens,rooks,bishops and knights
 xmm7 ::= <qh,rh:bh,kh>  - HighBoard of queens,rooks,bishops and knights

 Some piece related 16-bit constants (see below):

 xmm4 ::= <MQ16,MR16,MB16,MK16: 4,3,2,1>  // 16-bit word wise
 xmm5 ::= <SQ16,SR16,SB16,SK16: 4,3,2,1>

For each capture target:

 xmm0 ::= <0:target> == <0,0:th,tl> - single target bit

  // unpack the target
  PUNPCKLDQ  xmm0, xmm0 ; <th,th:tl,tl>
  MOVDQA     xmm1, xmm0
  PUNPCKLQDQ xmm0, xmm0 ; <tl,tl:tl,tl>
  PUNPCKHQDQ xmm1, xmm1 ; <th,th:th,th>

  // perform the ands with piece sets
  PAND       xmm0, xmm6 ; <tl,tl:tl,tl> &= <ql,rl:bl,kl>
  PAND       xmm1, xmm7 ; <th,th:th,th> &= <qh,rh:bh,kh>

  // compare high-low, which is equal for (three or four) empty sets
  PCMPEQD    xmm0, xmm1 ; <!Q32,!R32:!B32,!K32>
  PACKSSDW   xmm0, xmm0 ; <!Q16,!R16,!B16,!K16:!Q16,!R16,!B16,!K16>

  // mask and sum absolute byte difference to get the final values
  PAND       xmm0, xmm4 ; piece specific mask
  PSADBW     xmm0, xmm5 ; < gain/2 : pieceCode >


The <4,3,2,1> constants in the lower quad word of xmm4,xmm5
leave the following codes:

  0 - Pawn, 1 - Knight, 2 - Bishop, 3 - Rook, 4 - Queen

To get the (half of) value of a piece, it is necessary to have some other
constants, to performe the "and" and "Packed Sum of Absolute Differences of
Bytes" trick:

Mi16 = 256*Mi8 + Mi8; a sequence of two bytes
Si16 = 256*Si8 + Si8; a sequence of two bytes

Mi8 = (2*ValuePiece[i] -   ValuePawn + 3) div 8
Si8 = (4*ValuePiece[i] - 3*ValuePawn + 7) div 16

Because the max of absolute byte difference is 255, and "only" two byte
differences are added per piece, the maximum value is 510, therefore gain/2 is
computed, with value of a queen <= 1020.

Similar code is possible to consider SEE-like values for that target,

Some further comments on the instructions: Unfortunately there is one 4-cycle
vector path instruction on AMD64, PACKSSDW. I found no fast replacement for
that. All other instructions are "double direct path" instructions, four bytes
each. PSADBW takes 4 cycles, all other 2 cycles.

The unpack sequence to build the <th,th:th,th> <tl,tl:tl,tl> vector may be
computed with two PSHUFD instead of four unpack/mov-instructions, but PSHUFD is
4-cycle vector path too.

Comments, suggestions and improvements welcome of course.


This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.