Author: Gerd Isenberg
Date: 02:29:13 04/19/03
Go up one level in this thread
On April 18, 2003 at 15:11:34, Dieter Buerssner wrote:
>On April 18, 2003 at 03:17:52, Gerd Isenberg wrote:
>
>>static __forceinline int _fastcall BitCount8 (BitBoard bb)
>>{
>> __asm
>> {
>> mov ecx, dword ptr bb
>> xor eax, eax
>> test ecx, ecx
>> jz l1
>> l0: lea edx, [ecx-1]
>> inc eax
>> and ecx, edx
>> jnz l0
>> l1: mov ecx, dword ptr bb+4
>> test ecx, ecx
>> jz l3
>> l2: lea edx, [ecx-1]
>> inc eax
>> and ecx, edx
>> jnz l2
>> l3:
>> }
>>}
>
>Gerd, did you check, if this is really faster than corresponding C-code?
>
>int PopCount(BITBOARD a)
>{
> unsigned long w;
> int n = 0;
> w = *(unsigned long *)&a;
> if (w)
> do
> {
> n++;
> }
> while ((w &= w-1) != 0);
> w = *(((unsigned long *)&a)+1);
> if (w)
> do
> {
> n++;
> }
> while ((w &= w-1) != 0);
> return n;
>}
>
>This produces almost exactly your assembler code. The nasty casts are, because
>the compiler did not optimize well the corresponding
>
> w = a & 0xffffffffUL;
>
>and
>
> w = (a >> 32) & 0xffffffffUL;
>
>which would even make the code portable (even to architectures, where unsigned
>long has more than 32 bits. Otherwise the masks will be optimized away).
>
>Other tricks are possible to convince the compiler to produce good code (like
>using a union - but also not portable). The C-code might even be faster when
>inlined. The bitboard might already be in registers, and no moving to the stack
>would be needed. Taking the adress above in the casting tricks might make such
>an optimization impossible to detect for the compiler, but a union should work.
>
>Regards,
>Dieter
Yes Dieter, i tried a lot. In the meantime a have two header and cpp files
containing a lot of popcount and bitscan stuff, as well as portable c and athlon
specific mmx code with amount of conditional compile directives.
Also inlining is an issue here. I also use the pointer cast to extract the
32-bit words from a bitboard variable.
For popcount i found a combination of mmx-routines and the above
assembler-routine for bitboards with low population probabilty fastest.
Regards,
Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.