Author: Ernst A. Heinz
Date: 11:27:16 01/03/99
Go up one level in this thread
On January 03, 1999 at 10:19:17, Carsten Kossendey wrote: >On January 03, 1999 at 08:29:23, Roberto Waldteufel wrote: > >>Hi all, >> >>I recently did some code timing experiments, and I thought the results might be >>useful to some of you here... >> >>I coded two versions of a function to count the number of 1-bits in a bitboard. >>One method is to use a loop with the bitscan instructions, repeated on the low >>and high order 32-bit chunks. The other uses only logical instructions, with no >>loops. See Ernst Heinz's DarkThought site for a detailed account of these >>methods. I was interested to compare the methods on Pentium 32-bit architecture: >>the loop is best when only a few bits are set (it terminates quickly) with time >>proportional to the number of bits set, and the second method is faster if many >>bits are set, since it does not depend on how many bits are set. >> >>It turned out for me that the loop is preferable if at most 6 bits are set, and >>for 7 or more bits the logical instructions perform best. With exactly 7 bits, >>the difference is less than 1%. That was on a PII 333MHz box. The break-even >>point would be much higher on any Intel without fast bit-scans, ie bit scans are >>*much* faster on PII and PPro than on earlier Intel CPU's, whereas the logical >>instructon method would take about the same time for any bitboard, dependant >>only on clock speed. >> >>Hope this is useful for someone....:-) >> >>Roberto > >Jumping onto the same train, I did similar tests on the PowerPC a good while >ago, with around the same results (loops faster up to 6 or 7 bits), which is >interesting considering that the PPC doesn't have BSF/BSR instructions, but you >must rather use two instructions to do that: > > cntlzw rTemp, rSource ; find first 1 bit > rlwnm. rSource, rSource, rTemp, 0, 30 ; rotate to LSB and mask out > >Carsten No need to use assembler here because the wonders of binary subtraction and the two's complement allow for portable yet efficient ANSI-C formulations of both iterative and non-iterative population-count functions/macros. Below follows the according quote from my article "How DarkThought Plays Chess" as published in the ICCA Journal 20(3) in September 1997 and available as an electronic preprint from the WWW pages of "DarkThought" at URL <http://wwwipd.ira.uka.de/Tichy/DarkThought/>. =Ernst= ******************************************************************************** Bitboard Infrastructure DARKTHOUGHT makes extensive use of bitboards stored as 64bit unsigned integers where each bit represents a square of the chess board numbered from a8 = 63 to h1 = 0. Because the bit length of integers differs between compilers, machines, and operating systems, DARKTHOUGHT defines a set of portable data types like unsigned 8, ..., unsigned 64 that are automatically mapped to the appropriate types of the target platform (e.g. unsigned 64 defaults to unsigned long long in GNU C). In order to simplify the handling of bitboards, DARKTHOUGHT contains an abundant collection of functions and macros that provide efficient implementations of all the needed bitboard operations. Beside the basic bitwise AND, OR, XOR etc. the collection includes cascaded combinations thereof because they are frequently used and may even be executed in one cycle on some machines. The expression (x & ~y), for instance, corresponds to a single DEC Alpha instruction. Other important bitboard operations enjoy marvelously simple yet non-obvious formulations due to the wonders of binary subtraction and the two's complement. Two famous members of this class are the expressions (b & -b) which clears all but the least significant 1-bit of a bitboard and (b & (b - 1)) that clears only the least significant 1-bit. Unfortunately, the central find-bit and population-count operations have higher algorithmic complexity. Although a few CPUs like Motorola PowerPC, Sun UltraSparc and the forthcoming DEC Alpha-21264 feature instruction-level support of these operations, most current machines still require software implementations. The efficiency of the implementations in terms of speed and memory consumption is of crucial importance for bitboard-based chess programs because it tends to limit their overall performance. In this respect, the short expression for clearing the least significant 1-bit gives rise to a neat formulation of an iterative population-count function. unsigned int iterative_popcount(unsigned_64 b) { unsigned int n; for (n = 0; b != 0; n++, b &= (b - 1)); return n; } If the given bitboard contains few 1-bits, the loop terminates quickly thus making this short and easily inlinable function run fast. Otherwise, DARKTHOUGHT prefers the following non-iterative formulation that stems from the well-known ``Hacker's Memory'' collection of programming tricks. It performs better than intuitive methods with lookup tables because the tables get either too large or need too many lookups. #define m1 ((unsigned_64) 0x5555555555555555) #define m2 ((unsigned_64) 0x3333333333333333) unsigned int non_iterative_popcount(const unsigned_64 b) { unsigned_32 n; const unsigned_64 a = b - ((b >> 1) & m1); const unsigned_64 c = (a & m2) + ((a >> 2) & m2); n = ((unsigned_32) c) + ((unsigned_32) (c >> 32)); n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F); n = (n & 0xFFFF) + (n >> 16); n = (n & 0xFF) + (n >> 8); return n; } The idea of this population-count trick can also be applied to the find-bit problem yielding a find-function without any memory accesses. Unfortunately, however, this find-bit function is slower than fine-tuned table lookups. As executed by DARKTHOUGHT the lookups generate negligible memory traffic and compile to roughly 15 machine instructions on a DEC Alpha while relying on the conditional data-move capabilities of modern CPUs. This clever find-bit implementation further exploits the fact that the number of the least significant 1-bit of a bitboard is equal to the number of the most significant 1-bit of the bitboard xor'ed with itself minus one: FIND LSB(b) = FIND MSB(b ^ (b - 1)).
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.