Author: Dieter Buerssner
Date: 12:11:34 04/18/03
Go up one level in this thread
On April 18, 2003 at 03:17:52, Gerd Isenberg wrote: >static __forceinline int _fastcall BitCount8 (BitBoard bb) >{ > __asm > { > mov ecx, dword ptr bb > xor eax, eax > test ecx, ecx > jz l1 > l0: lea edx, [ecx-1] > inc eax > and ecx, edx > jnz l0 > l1: mov ecx, dword ptr bb+4 > test ecx, ecx > jz l3 > l2: lea edx, [ecx-1] > inc eax > and ecx, edx > jnz l2 > l3: > } >} Gerd, did you check, if this is really faster than corresponding C-code? int PopCount(BITBOARD a) { unsigned long w; int n = 0; w = *(unsigned long *)&a; if (w) do { n++; } while ((w &= w-1) != 0); w = *(((unsigned long *)&a)+1); if (w) do { n++; } while ((w &= w-1) != 0); return n; } This produces almost exactly your assembler code. The nasty casts are, because the compiler did not optimize well the corresponding w = a & 0xffffffffUL; and w = (a >> 32) & 0xffffffffUL; which would even make the code portable (even to architectures, where unsigned long has more than 32 bits. Otherwise the masks will be optimized away). Other tricks are possible to convince the compiler to produce good code (like using a union - but also not portable). The C-code might even be faster when inlined. The bitboard might already be in registers, and no moving to the stack would be needed. Taking the adress above in the casting tricks might make such an optimization impossible to detect for the compiler, but a union should work. Regards, Dieter
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.