Author: Gerd Isenberg
Date: 02:29:13 04/19/03
Go up one level in this thread
On April 18, 2003 at 15:11:34, Dieter Buerssner wrote: >On April 18, 2003 at 03:17:52, Gerd Isenberg wrote: > >>static __forceinline int _fastcall BitCount8 (BitBoard bb) >>{ >> __asm >> { >> mov ecx, dword ptr bb >> xor eax, eax >> test ecx, ecx >> jz l1 >> l0: lea edx, [ecx-1] >> inc eax >> and ecx, edx >> jnz l0 >> l1: mov ecx, dword ptr bb+4 >> test ecx, ecx >> jz l3 >> l2: lea edx, [ecx-1] >> inc eax >> and ecx, edx >> jnz l2 >> l3: >> } >>} > >Gerd, did you check, if this is really faster than corresponding C-code? > >int PopCount(BITBOARD a) >{ > unsigned long w; > int n = 0; > w = *(unsigned long *)&a; > if (w) > do > { > n++; > } > while ((w &= w-1) != 0); > w = *(((unsigned long *)&a)+1); > if (w) > do > { > n++; > } > while ((w &= w-1) != 0); > return n; >} > >This produces almost exactly your assembler code. The nasty casts are, because >the compiler did not optimize well the corresponding > > w = a & 0xffffffffUL; > >and > > w = (a >> 32) & 0xffffffffUL; > >which would even make the code portable (even to architectures, where unsigned >long has more than 32 bits. Otherwise the masks will be optimized away). > >Other tricks are possible to convince the compiler to produce good code (like >using a union - but also not portable). The C-code might even be faster when >inlined. The bitboard might already be in registers, and no moving to the stack >would be needed. Taking the adress above in the casting tricks might make such >an optimization impossible to detect for the compiler, but a union should work. > >Regards, >Dieter Yes Dieter, i tried a lot. In the meantime a have two header and cpp files containing a lot of popcount and bitscan stuff, as well as portable c and athlon specific mmx code with amount of conditional compile directives. Also inlining is an issue here. I also use the pointer cast to extract the 32-bit words from a bitboard variable. For popcount i found a combination of mmx-routines and the above assembler-routine for bitboards with low population probabilty fastest. Regards, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.