Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: is this really faster?

Author: Gerd Isenberg

Date: 02:29:13 04/19/03

Go up one level in this thread


On April 18, 2003 at 15:11:34, Dieter Buerssner wrote:

>On April 18, 2003 at 03:17:52, Gerd Isenberg wrote:
>
>>static __forceinline int _fastcall BitCount8 (BitBoard bb)
>>{
>>	__asm
>>	{
>>		mov     ecx, dword ptr bb
>>		xor     eax, eax
>>		test    ecx, ecx
>>		jz      l1
>>	    l0: lea     edx, [ecx-1]
>>		inc     eax
>>		and     ecx, edx
>>		jnz     l0
>>	    l1: mov     ecx, dword ptr bb+4
>>		test    ecx, ecx
>>		jz      l3
>>	    l2: lea     edx, [ecx-1]
>>		inc     eax
>>		and     ecx, edx
>>		jnz     l2
>>	    l3:
>>	}
>>}
>
>Gerd, did you check, if this is really faster than corresponding C-code?
>
>int PopCount(BITBOARD a)
>{
>  unsigned long w;
>  int n = 0;
>  w = *(unsigned long *)&a;
>  if (w)
>  do
>  {
>    n++;
>  }
>  while ((w &= w-1) != 0);
>  w = *(((unsigned long *)&a)+1);
>  if (w)
>  do
>  {
>    n++;
>  }
>  while ((w &= w-1) != 0);
>  return n;
>}
>
>This produces almost exactly your assembler code. The nasty casts are, because
>the compiler did not optimize well the corresponding
>
>  w = a & 0xffffffffUL;
>
>and
>
>  w = (a >> 32) & 0xffffffffUL;
>
>which would even make the code portable (even to architectures, where unsigned
>long has more than 32 bits. Otherwise the masks will be optimized away).
>
>Other tricks are possible to convince the compiler to produce good code (like
>using a union - but also not portable). The C-code might even be faster when
>inlined. The bitboard might already be in registers, and no moving to the stack
>would be needed. Taking the adress above in the casting tricks might make such
>an optimization impossible to detect for the compiler, but a union should work.
>
>Regards,
>Dieter

Yes Dieter, i tried a lot. In the meantime a have two header and cpp files
containing a lot of popcount and bitscan stuff, as well as portable c and athlon
specific mmx code with amount of conditional compile directives.
Also inlining is an issue here. I also use the pointer cast to extract the
32-bit words from a bitboard variable.

For popcount i found a combination of mmx-routines and the above
assembler-routine for bitboards with low population probabilty fastest.

Regards,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.