Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: is this really faster?

Author: Dieter Buerssner

Date: 12:11:34 04/18/03

Go up one level in this thread


On April 18, 2003 at 03:17:52, Gerd Isenberg wrote:

>static __forceinline int _fastcall BitCount8 (BitBoard bb)
>{
>	__asm
>	{
>		mov     ecx, dword ptr bb
>		xor     eax, eax
>		test    ecx, ecx
>		jz      l1
>	    l0: lea     edx, [ecx-1]
>		inc     eax
>		and     ecx, edx
>		jnz     l0
>	    l1: mov     ecx, dword ptr bb+4
>		test    ecx, ecx
>		jz      l3
>	    l2: lea     edx, [ecx-1]
>		inc     eax
>		and     ecx, edx
>		jnz     l2
>	    l3:
>	}
>}

Gerd, did you check, if this is really faster than corresponding C-code?

int PopCount(BITBOARD a)
{
  unsigned long w;
  int n = 0;
  w = *(unsigned long *)&a;
  if (w)
  do
  {
    n++;
  }
  while ((w &= w-1) != 0);
  w = *(((unsigned long *)&a)+1);
  if (w)
  do
  {
    n++;
  }
  while ((w &= w-1) != 0);
  return n;
}

This produces almost exactly your assembler code. The nasty casts are, because
the compiler did not optimize well the corresponding

  w = a & 0xffffffffUL;

and

  w = (a >> 32) & 0xffffffffUL;

which would even make the code portable (even to architectures, where unsigned
long has more than 32 bits. Otherwise the masks will be optimized away).

Other tricks are possible to convince the compiler to produce good code (like
using a union - but also not portable). The C-code might even be faster when
inlined. The bitboard might already be in registers, and no moving to the stack
would be needed. Taking the adress above in the casting tricks might make such
an optimization impossible to detect for the compiler, but a union should work.

Regards,
Dieter




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.