Author: Gerd Isenberg
Date: 13:48:39 05/03/05
Calling serialize with -1 bitboard (all bits set) takes 488 cycles best case
(all in cache) on my AMD64 in 32-bit mode, msc2005ß. Bitscan with bsf pair takes
1364 cycles. Puhh, something like 8 versus 21 cycles per bitscan call.
int serialize(BitBoard bb, int sq[], int foo)
{
for (int n=0; bb; sq[n++] = bitScanWithReset(bb) | foo);
return n;
}
__forceinline
int bitScanWithReset(BitBoard &bb)
{
static const unsigned int CACHE_ALIGN lsz64_tbl[64] =
{
63,30, 3,32,59,14,11,33,
60,24,50, 9,55,19,21,34,
61,29, 2,53,51,23,41,18,
56,28, 1,43,46,27, 0,35,
62,31,58, 4, 5,49,54, 6,
15,52,12,40, 7,42,45,16,
25,57,48,13,10,39, 8,44,
20,47,38,22,17,37,36,26,
};
BitBoard b = bb ^ (bb - 1);
bb = bb & (bb - 1); // only one further and
unsigned int fold = ((int) b) ^ ((int)(b>>32));
return lsz64_tbl[(fold * 0x78291ACF) >> (32-6)];
}
With the cost to store bb-1 in one additional register (pair) the improvement is
to save the "not" instruction in bb = bb & ~b to reset the found bit - at least
the above traverse loop keeps all in registers:
?serialize@@YAH_KQAHH@Z PROC ; serialize
00510 53 push ebx
00511 8b 5c 24 0c mov ebx, DWORD PTR _bb$[esp+4]
00515 57 push edi
00516 8b 7c 24 0c mov edi, DWORD PTR _bb$[esp+4]
0051a 8b cf mov ecx, edi
0051c 33 c0 xor eax, eax
0051e 0b cb or ecx, ebx
00520 74 40 je SHORT $LN1@serialize
00522 55 push ebp
00523 56 push esi
$LL3@serialize:
00524 8b d7 mov edx, edi
00526 83 c2 ff add edx, -1
00529 8b f3 mov esi, ebx
0052b 83 d6 ff adc esi, -1
0052e 8b ea mov ebp, edx
00530 33 ef xor ebp, edi
00532 8b ce mov ecx, esi
00534 33 cb xor ecx, ebx
00536 33 cd xor ecx, ebp
00538 69 c9 cf 1a 29 78 imul ecx, 78291acfH
0053e 23 fa and edi, edx
00540 c1 e9 1a shr ecx, 26
00543 8b 14 8d 00 00 00 00 mov edx, DWORD PTR ...[ecx*4]
0054a 0b 54 24 20 or edx, DWORD PTR _foo$[esp+12]
0054e 8b 4c 24 1c mov ecx, DWORD PTR _sq$[esp+12]
00552 89 14 81 mov DWORD PTR [ecx+eax*4], edx
00555 23 de and ebx, esi
00557 8b d7 mov edx, edi
00559 83 c0 01 add eax, 1
0055c 0b d3 or edx, ebx
0055e 75 c4 jne SHORT $LL3@serialize
00560 5e pop esi
00561 5d pop ebp
$LN1@serialize:
00562 5f pop edi
00563 5b pop ebx
00564 c3 ret 0
?serialize@@YAH_KQAHH@Z ENDP ; serialize
Cheers,
Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.