Author: Eugene Nalimov
Date: 10:55:09 01/03/99
Go up one level in this thread
On January 03, 1999 at 10:19:17, Carsten Kossendey wrote: >On January 03, 1999 at 08:29:23, Roberto Waldteufel wrote: > >>Hi all, >> >>I recently did some code timing experiments, and I thought the results might be >>useful to some of you here... >> >>I coded two versions of a function to count the number of 1-bits in a bitboard. >>One method is to use a loop with the bitscan instructions, repeated on the low >>and high order 32-bit chunks. The other uses only logical instructions, with no >>loops. See Ernst Heinz's DarkThought site for a detailed account of these >>methods. I was interested to compare the methods on Pentium 32-bit architecture: >>the loop is best when only a few bits are set (it terminates quickly) with time >>proportional to the number of bits set, and the second method is faster if many >>bits are set, since it does not depend on how many bits are set. >> >>It turned out for me that the loop is preferable if at most 6 bits are set, and >>for 7 or more bits the logical instructions perform best. With exactly 7 bits, >>the difference is less than 1%. That was on a PII 333MHz box. The break-even >>point would be much higher on any Intel without fast bit-scans, ie bit scans are >>*much* faster on PII and PPro than on earlier Intel CPU's, whereas the logical >>instructon method would take about the same time for any bitboard, dependant >>only on clock speed. >> >>Hope this is useful for someone....:-) >> >>Roberto > >Jumping onto the same train, I did similar tests on the PowerPC a good while >ago, with around the same results (loops faster up to 6 or 7 bits), which is >interesting considering that the PPC doesn't have BSF/BSR instructions, but you >must rather use two instructions to do that: > > cntlzw rTemp, rSource ; find first 1 bit > rlwnm. rSource, rSource, rTemp, 0, 30 ; rotate to LSB and mask out > >Carsten Actually, you don't need BSF/BSR instructions here. You can use exactly the same trick that Robert Hyatt used in Crafty source - use the fact that A&(A-1) clears least significant bit. In Crafty 16.3 source there is in-line assembly stuff for VC++ as well as C function. Function that counts bits is just 2 loops, first handles lower half of 64-bit bitboard, and second handles upper half: mov ecx, low_part_of_the_bitboard xor eax, eax test ecx, ecx jz end_of_loop1 loop1: lea edx, [ecx-1] inc eax and ecx, edx jnz loop1 end_of_loop1: mov ecx,high_part_of_the_bitboard test ecx, ecx jz end_of_loop2 loop2: lea edx, [ecx-1] inc eax and ecx, edx jnz loop2 end_of_loop: ret I will not be surpised if that function will be faster even on x86 processors that have fast BSF/BSR. Even if it's slightly slower, it has the benefit that it's portable - it will work reasonable well on *any* x86 compatible, be it AMD, Cyrix, IDT, etc. Eugene
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.