Author: Gerd Isenberg
Date: 15:04:29 05/04/05
Go up one level in this thread
On May 04, 2005 at 15:37:38, Walter Faxon wrote:
<snip>
>
>Hi, Gerd.
>
>Can you see a way using inline assembly to use a condition appearing while you
>extract the bit to avoid testing the whole bb for zero in an outer loop?
>
>-- Walter
Hi Walter,
my constrain for that "unfair" bench was a single loop with 64-bit log2 ;-)
Yes, two separate 32-bit while-loops with the boolean intrinsics are faster.
697 cycles instead of 1100. As expected dominated by the latency of one bsf per
set bit with good "predictable" bitboard pattern.
Since i may have several loops of that kind i like to write those kind of code
portable in 64-bit manner, but eventually processor specific instances of
bitscan routines in some header files.
I was curious about the very fast AMD64 32-bit imul after playing around with
matt's code after a while with msvc 2005 beta 32-bit compiler.
Here the number of cycles (rdtsc, including call overhead) of the two serialize
functions. The first column based on Matt's bitscan and the for-loop, the second
the bsf-intrinsic form below with some constant bb-pattern:
pattern #bits Matt Intr. % slower
0xffffffffffffffff 64 bit 488 697 43%
0xff00ff00ff00ff00 32 bit 263 379 44%
0x0000000000000001 1 bit 29 39 34%
0x1234567887654321 26 bit 219 316 44%
Cheers,
Gerd
int serialize(BitBoard bb, int sq[], int foo)
{
unsigned long idx;
unsigned int b = (int)bb;
int n = 0;
while ( _BitScanForward(&idx, b) )
{
sq[n++] = idx;
b &= b-1;
}
b = (int)(bb>>32);
while ( _BitScanForward(&idx, b) )
{
sq[n++] = idx+32;
b &= b-1;
}
return n;
}
the generated assembly:
?serialize@@YAH_KQAHH@Z PROC ; serialize
00510 8b 4c 24 04 mov ecx, DWORD PTR _bb$[esp-4]
00514 33 c0 xor eax, eax
00516 0f bc d1 bsf edx, ecx
00519 56 push esi
0051a 8b 74 24 10 mov esi, DWORD PTR _sq$[esp]
0051e 74 10 je SHORT $LN3@serialize
$LL4@serialize:
00520 89 14 86 mov DWORD PTR [esi+eax*4], edx
00523 8d 51 ff lea edx, DWORD PTR [ecx-1]
00526 23 ca and ecx, edx
00528 83 c0 01 add eax, 1
0052b 0f bc d1 bsf edx, ecx
0052e 75 f0 jne SHORT $LL4@serialize
$LN3@serialize:
00530 8b 4c 24 0c mov ecx, DWORD PTR _bb$[esp+4]
00534 0f bc d1 bsf edx, ecx
00537 74 1a je SHORT $LN1@serialize
00539 8d a4 24 00 00
00 00 npad 7
$LL2@serialize:
00540 83 c2 20 add edx, 32
00543 89 14 86 mov DWORD PTR [esi+eax*4], edx
00546 8d 51 ff lea edx, DWORD PTR [ecx-1]
00549 23 ca and ecx, edx
0054b 83 c0 01 add eax, 1
0054e 0f bc d1 bsf edx, ecx
00551 75 ed jne SHORT $LL2@serialize
$LN1@serialize:
00553 5e pop esi
00554 c3 ret 0
?serialize@@YAH_KQAHH@Z ENDP ; serialize
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.