Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Countbits() Function

Author: Eugene Nalimov

Date: 10:55:09 01/03/99

Go up one level in this thread


On January 03, 1999 at 10:19:17, Carsten Kossendey wrote:

>On January 03, 1999 at 08:29:23, Roberto Waldteufel wrote:
>
>>Hi all,
>>
>>I recently did some code timing experiments, and I thought the results might be
>>useful to some of you here...
>>
>>I coded two versions of a function to count the number of 1-bits in a bitboard.
>>One method is to use a loop with the bitscan instructions, repeated on the low
>>and high order 32-bit chunks. The other uses only logical instructions, with no
>>loops. See Ernst Heinz's DarkThought site for a detailed account of these
>>methods. I was interested to compare the methods on Pentium 32-bit architecture:
>>the loop is best when only a few bits are set (it terminates quickly) with time
>>proportional to the number of bits set, and the second method is faster if many
>>bits are set, since it does not depend on how many bits are set.
>>
>>It turned out for me that the loop is preferable if at most 6 bits are set, and
>>for 7 or more bits the logical instructions perform best. With exactly 7 bits,
>>the difference is less than 1%. That was on a PII 333MHz box. The break-even
>>point would be much higher on any Intel without fast bit-scans, ie bit scans are
>>*much* faster on PII and PPro than on earlier Intel CPU's, whereas the logical
>>instructon method would take about the same time for any bitboard, dependant
>>only on clock speed.
>>
>>Hope this is useful for someone....:-)
>>
>>Roberto
>
>Jumping onto the same train, I did similar tests on the PowerPC a good while
>ago, with around the same results (loops faster up to 6 or 7 bits), which is
>interesting considering that the PPC doesn't have BSF/BSR instructions, but you
>must rather use two instructions to do that:
>
>  cntlzw rTemp,   rSource               ; find first 1 bit
>  rlwnm. rSource, rSource, rTemp, 0, 30 ; rotate to LSB and mask out
>
>Carsten

Actually, you don't need BSF/BSR instructions here. You can
use exactly the same trick that Robert Hyatt used in Crafty
source - use the fact that A&(A-1) clears least significant
bit. In Crafty 16.3 source there is in-line assembly stuff
for VC++ as well as C function. Function that counts bits is
just 2 loops, first handles lower half of 64-bit bitboard,
and second handles upper half:

        mov     ecx, low_part_of_the_bitboard
        xor     eax, eax
        test    ecx, ecx
        jz      end_of_loop1
loop1:
        lea     edx, [ecx-1]
        inc     eax
        and     ecx, edx
        jnz     loop1
end_of_loop1:
        mov     ecx,high_part_of_the_bitboard
        test    ecx, ecx
        jz      end_of_loop2
loop2:
        lea     edx, [ecx-1]
        inc     eax
        and     ecx, edx
        jnz     loop2
end_of_loop:
        ret

I will not be surpised if that function will be faster even
on x86 processors that have fast BSF/BSR. Even if it's
slightly slower, it has the benefit that it's portable -
it will work reasonable well on *any* x86 compatible, be it
AMD, Cyrix, IDT, etc.

Eugene



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.