Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Matt Taylor's Bitscan with reset improved

Author: Gerd Isenberg

Date: 15:04:29 05/04/05

On May 04, 2005 at 15:37:38, Walter Faxon wrote:
<snip>
>
>Hi, Gerd.
>
>Can you see a way using inline assembly to use a condition appearing while you
>extract the bit to avoid testing the whole bb for zero in an outer loop?
>
>-- Walter

Hi Walter,

my constrain for that "unfair" bench was a single loop with 64-bit log2 ;-)

Yes, two separate 32-bit while-loops with the boolean intrinsics are faster.
697 cycles instead of 1100. As expected dominated by the latency of one bsf per
set bit with good "predictable" bitboard pattern.

Since i may have several loops of that kind i like to write those kind of code
portable in 64-bit manner, but eventually processor specific instances of
bitscan routines in some header files.

I was curious about the very fast AMD64 32-bit imul after playing around with
matt's code after a while with msvc 2005 beta 32-bit compiler.

Here the number of cycles (rdtsc, including call overhead) of the two serialize
functions. The first column based on Matt's bitscan and the for-loop, the second
the bsf-intrinsic form below with some constant bb-pattern:

pattern            #bits  Matt Intr. % slower
0xffffffffffffffff 64 bit  488  697  43%
0xff00ff00ff00ff00 32 bit  263  379  44%
0x0000000000000001  1 bit   29   39  34%
0x1234567887654321 26 bit  219  316  44%

Cheers,
Gerd


int serialize(BitBoard bb, int sq[], int foo)
{
  unsigned long idx;
  unsigned int b = (int)bb;
  int n = 0;
  while ( _BitScanForward(&idx, b) )
  {
    sq[n++] = idx;
    b &= b-1;
  }
  b = (int)(bb>>32);
  while ( _BitScanForward(&idx, b) )
  {
    sq[n++] = idx+32;
    b &= b-1;
  }
  return n;
}

the generated assembly:

?serialize@@YAH_KQAHH@Z PROC				; serialize
  00510	8b 4c 24 04	 mov	 ecx, DWORD PTR _bb$[esp-4]
  00514	33 c0		 xor	 eax, eax
  00516	0f bc d1	 bsf	 edx, ecx
  00519	56		 push	 esi
  0051a	8b 74 24 10	 mov	 esi, DWORD PTR _sq$[esp]
  0051e	74 10		 je	 SHORT $LN3@serialize
$LL4@serialize:
  00520	89 14 86	 mov	 DWORD PTR [esi+eax*4], edx
  00523	8d 51 ff	 lea	 edx, DWORD PTR [ecx-1]
  00526	23 ca		 and	 ecx, edx
  00528	83 c0 01	 add	 eax, 1
  0052b	0f bc d1	 bsf	 edx, ecx
  0052e	75 f0		 jne	 SHORT $LL4@serialize
$LN3@serialize:
  00530	8b 4c 24 0c	 mov	 ecx, DWORD PTR _bb$[esp+4]
  00534	0f bc d1	 bsf	 edx, ecx
  00537	74 1a		 je	 SHORT $LN1@serialize
  00539	8d a4 24 00 00
	00 00		 npad	 7
$LL2@serialize:
  00540	83 c2 20	 add	 edx, 32
  00543	89 14 86	 mov	 DWORD PTR [esi+eax*4], edx
  00546	8d 51 ff	 lea	 edx, DWORD PTR [ecx-1]
  00549	23 ca		 and	 ecx, edx
  0054b	83 c0 01	 add	 eax, 1
  0054e	0f bc d1	 bsf	 edx, ecx
  00551	75 ed		 jne	 SHORT $LL2@serialize
$LN1@serialize:
  00553	5e		 pop	 esi
  00554	c3		 ret	 0
?serialize@@YAH_KQAHH@Z ENDP				; serialize

correction Gerd Isenberg 15:29:31 05/04/05

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.