Computer Chess Club Archives


Search

Terms

Messages

Subject: Matt Taylor's Bitscan with reset improved

Author: Gerd Isenberg

Date: 13:48:39 05/03/05


Calling serialize with -1 bitboard (all bits set) takes 488 cycles best case
(all in cache) on my AMD64 in 32-bit mode, msc2005ß. Bitscan with bsf pair takes
1364 cycles. Puhh, something like 8 versus 21 cycles per bitscan call.

int serialize(BitBoard bb, int sq[], int foo)
{
  for (int n=0; bb; sq[n++] = bitScanWithReset(bb) | foo);
  return n;
}

__forceinline
int bitScanWithReset(BitBoard &bb)
{
  static const unsigned int CACHE_ALIGN lsz64_tbl[64] =
  {
    63,30, 3,32,59,14,11,33,
    60,24,50, 9,55,19,21,34,
    61,29, 2,53,51,23,41,18,
    56,28, 1,43,46,27, 0,35,
    62,31,58, 4, 5,49,54, 6,
    15,52,12,40, 7,42,45,16,
    25,57,48,13,10,39, 8,44,
    20,47,38,22,17,37,36,26,
  };

  BitBoard b = bb ^ (bb - 1);
          bb = bb & (bb - 1); // only one further and
  unsigned int fold = ((int) b) ^ ((int)(b>>32));
  return  lsz64_tbl[(fold * 0x78291ACF) >> (32-6)];
}

With the cost to store bb-1 in one additional register (pair) the improvement is
to save the "not" instruction in bb = bb & ~b to reset the found bit - at least
the above traverse loop keeps all in registers:

?serialize@@YAH_KQAHH@Z PROC			; serialize
  00510	53		 push	 ebx
  00511	8b 5c 24 0c	 mov	 ebx, DWORD PTR _bb$[esp+4]
  00515	57		 push	 edi
  00516	8b 7c 24 0c	 mov	 edi, DWORD PTR _bb$[esp+4]
  0051a	8b cf		 mov	 ecx, edi
  0051c	33 c0		 xor	 eax, eax
  0051e	0b cb		 or	 ecx, ebx
  00520	74 40		 je	 SHORT $LN1@serialize
  00522	55		 push	 ebp
  00523	56		 push	 esi
$LL3@serialize:
  00524	8b d7		 mov	 edx, edi
  00526	83 c2 ff	 add	 edx, -1
  00529	8b f3		 mov	 esi, ebx
  0052b	83 d6 ff	 adc	 esi, -1
  0052e	8b ea		 mov	 ebp, edx
  00530	33 ef		 xor	 ebp, edi
  00532	8b ce		 mov	 ecx, esi
  00534	33 cb		 xor	 ecx, ebx
  00536	33 cd		 xor	 ecx, ebp
  00538	69 c9 cf 1a 29 78 imul	 ecx, 78291acfH
  0053e	23 fa		 and	 edi, edx
  00540	c1 e9 1a	 shr	 ecx, 26
  00543	8b 14 8d 00 00 00 00 mov edx, DWORD PTR ...[ecx*4]
  0054a	0b 54 24 20	 or	 edx, DWORD PTR _foo$[esp+12]
  0054e	8b 4c 24 1c	 mov	 ecx, DWORD PTR _sq$[esp+12]
  00552	89 14 81	 mov	 DWORD PTR [ecx+eax*4], edx
  00555	23 de		 and	 ebx, esi
  00557	8b d7		 mov	 edx, edi
  00559	83 c0 01	 add	 eax, 1
  0055c	0b d3		 or	 edx, ebx
  0055e	75 c4		 jne	 SHORT $LL3@serialize
  00560	5e		 pop	 esi
  00561	5d		 pop	 ebp
$LN1@serialize:
  00562	5f		 pop	 edi
  00563	5b		 pop	 ebx
  00564	c3		 ret	 0
?serialize@@YAH_KQAHH@Z ENDP			; serialize


Cheers,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.