Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Countbits() Function

Author: Ernst A. Heinz
Date: 11:27:16 01/03/99
On January 03, 1999 at 10:19:17, Carsten Kossendey wrote:

>On January 03, 1999 at 08:29:23, Roberto Waldteufel wrote:
>
>>Hi all,
>>
>>I recently did some code timing experiments, and I thought the results might be
>>useful to some of you here...
>>
>>I coded two versions of a function to count the number of 1-bits in a bitboard.
>>One method is to use a loop with the bitscan instructions, repeated on the low
>>and high order 32-bit chunks. The other uses only logical instructions, with no
>>loops. See Ernst Heinz's DarkThought site for a detailed account of these
>>methods. I was interested to compare the methods on Pentium 32-bit architecture:
>>the loop is best when only a few bits are set (it terminates quickly) with time
>>proportional to the number of bits set, and the second method is faster if many
>>bits are set, since it does not depend on how many bits are set.
>>
>>It turned out for me that the loop is preferable if at most 6 bits are set, and
>>for 7 or more bits the logical instructions perform best. With exactly 7 bits,
>>the difference is less than 1%. That was on a PII 333MHz box. The break-even
>>point would be much higher on any Intel without fast bit-scans, ie bit scans are
>>*much* faster on PII and PPro than on earlier Intel CPU's, whereas the logical
>>instructon method would take about the same time for any bitboard, dependant
>>only on clock speed.
>>
>>Hope this is useful for someone....:-)
>>
>>Roberto
>
>Jumping onto the same train, I did similar tests on the PowerPC a good while
>ago, with around the same results (loops faster up to 6 or 7 bits), which is
>interesting considering that the PPC doesn't have BSF/BSR instructions, but you
>must rather use two instructions to do that:
>
>  cntlzw rTemp,   rSource               ; find first 1 bit
>  rlwnm. rSource, rSource, rTemp, 0, 30 ; rotate to LSB and mask out
>
>Carsten

No need to use assembler here because the wonders of binary subtraction and the
two's complement allow for portable yet efficient ANSI-C formulations of both
iterative and non-iterative population-count functions/macros.

Below follows the according quote from my article "How DarkThought Plays Chess"
as published in the ICCA Journal 20(3) in September 1997 and available as an
electronic preprint from the WWW pages of "DarkThought" at URL
<http://wwwipd.ira.uka.de/Tichy/DarkThought/>.

=Ernst=

********************************************************************************

Bitboard Infrastructure

DARKTHOUGHT makes extensive use of bitboards stored as 64bit unsigned integers
where each
bit represents a square of the chess board numbered from a8 = 63 to h1 = 0.
Because the bit length
of integers differs between compilers, machines, and operating systems,
DARKTHOUGHT defines
a set of portable data types like unsigned 8, ..., unsigned 64 that are
automatically mapped to the
appropriate types of the target platform (e.g. unsigned 64 defaults to unsigned
long long in
GNU C).

In order to simplify the handling of bitboards, DARKTHOUGHT contains an abundant
collection of
functions and macros that provide efficient implementations of all the needed
bitboard operations.
Beside the basic bitwise AND, OR, XOR etc. the collection includes cascaded
combinations
thereof because they are frequently used and may even be executed in one cycle
on some
machines. The expression (x & ~y), for instance, corresponds to a single DEC
Alpha
instruction. Other important bitboard operations enjoy marvelously simple yet
non-obvious
formulations due to the wonders of binary subtraction and the two's complement.
Two famous
members of this class are the expressions (b & -b) which clears all but the
least significant 1-bit
of a bitboard and (b & (b - 1)) that clears only the least significant 1-bit.

Unfortunately, the central find-bit and population-count operations have higher
algorithmic
complexity. Although a few CPUs like Motorola PowerPC, Sun UltraSparc and the
forthcoming
DEC Alpha-21264 feature instruction-level support of these operations, most
current machines
still require software implementations. The efficiency of the implementations in
terms of speed and
memory consumption is of crucial importance for bitboard-based chess programs
because it
tends to limit their overall performance. In this respect, the short expression
for clearing the least
significant 1-bit gives rise to a neat formulation of an iterative
population-count function.

unsigned int iterative_popcount(unsigned_64 b) {
    unsigned int n;
    for (n = 0; b != 0; n++, b &= (b - 1));
    return n;
}

If the given bitboard contains few 1-bits, the loop terminates quickly thus
making this short and
easily inlinable function run fast. Otherwise, DARKTHOUGHT prefers the following
non-iterative
formulation that stems from the well-known ``Hacker's Memory'' collection of
programming
tricks. It performs better than intuitive methods with lookup tables because the
tables get either
too large or need too many lookups.

#define m1 ((unsigned_64) 0x5555555555555555)
#define m2 ((unsigned_64) 0x3333333333333333)

unsigned int non_iterative_popcount(const unsigned_64 b) {
    unsigned_32 n;
    const unsigned_64 a = b - ((b >> 1) & m1);
    const unsigned_64 c = (a & m2) + ((a >> 2) & m2);
    n = ((unsigned_32) c) + ((unsigned_32) (c >> 32));
    n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F);
    n = (n & 0xFFFF) + (n >> 16);
    n = (n & 0xFF) + (n >> 8);
    return n;
}

The idea of this population-count trick can also be applied to the find-bit
problem yielding a
find-function without any memory accesses. Unfortunately, however, this find-bit
function is
slower than fine-tuned table lookups. As executed by DARKTHOUGHT the lookups
generate
negligible memory traffic and compile to roughly 15 machine instructions on a
DEC Alpha while
relying on the conditional data-move capabilities of modern CPUs. This clever
find-bit
implementation further exploits the fact that the number of the least
significant 1-bit of a bitboard
is equal to the number of the most significant 1-bit of the bitboard xor'ed with
itself minus one:
FIND LSB(b) = FIND MSB(b ^ (b - 1)).
Re: Countbits() Function Carsten Kossendey 17:47:04 01/03/99
- Re: Countbits() Function Roberto Waldteufel 10:58:13 01/05/99
- Re: Countbits() Function Ernst A. Heinz 02:18:00 01/04/99
Re: Countbits() Function Daniel Clausen 12:14:38 01/03/99
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.