Author: Oliver Roese
Date: 03:22:10 04/24/00
Go up one level in this thread
On April 19, 2000 at 22:10:19, Flemming Rodler wrote:
>Dear all,
>
>
>Thanks for all the post I got to my thread "Bit counting below". For the problem
>of counting the number of bits in a 64 bit variable I implemented the 3 schemes
>suggested in the answers plus another scheme that I found on the web. Here are
>my findings.
>
>Test computer: AMD K6-II 350 Mhz.
>
>Test: I timed how long each scheme took by calling them 2^20 and 4*2^20 times.
> This was done for different number bits set to one.
>
>The different schemes are listed at the end of this post as Method 1,..
>
>Result: for 2^20 repeatition.
>
> #bits on: 1 2 3 4 5 6 7 8
> ----------------------------------------------------------------------
> time in sec. Method 1: 0.07 0.10 0.12 0.15 0.17 0.19 0.22 0.24
>
> Method 2: 0.15 regardless of #bits
> Method 3: 0.20
> Method 4: 0.20
>
>
>Result: for 4*2^20 repeatition.
>
> #bits on: 1 2 3 4 5 6 7 8
> ----------------------------------------------------------------------
> time in sec. Method 1: 0.30 0.40 0.49 0.59 0.69 0.78 0.87 0.98
>
> Method 2: 0.59 regardless of #bits
> Method 3: 0.77
> Method 4: 0.77
>
>
>So we see that 4 bits is the break-even point for Method 1 and Method 2.
>I do not know if it has any practical influence which methods you use in a chess
>program but I might try it out with Crafty which I believe uses method 1 on a
>PC. It would be interesting to change it to method 2 and see if there is a
>significant change in the number of nodes searched per second. If so then it
>might be good to use different versions when counting pawns and f.ex. queens.
>
>
>I hope you will find the info useful and if you have any ideas of how to make
>them even faster please let me know so I can update the test results.
>
>Best regards
>Flemming
>
>
>Here are the 4 used methods. They were all compiled with gcc and -O3 turned onI
>do not who to give credit for the various methods. Sorry for that. The first two
>methods I have from the previous thread of posts. The third method I have from
>the DarkThought webpages and the fourth method I got from the link supplied in
>the previous posts.
>
>
>
>Method 1:
>int bitcount1(register unsigned64 b) {
>
> register int c=0;
>
> for(;b;c++,b &= b-1);
>
> return c;
>}
>
>
>Method 2:
>static char nbits[256] =
>{
> 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
> 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
> 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
>};
>
>int bitcount2(unsigned64 b)
>{
> register unsigned long *n = (unsigned long *) &b;
> return
> nbits[(unsigned char) n[0]] +
> nbits[(unsigned char) (n[0] >> 8)] +
> nbits[(unsigned char) (n[0] >> 16)] +
> nbits[(unsigned char) (n[0] >> 24)] +
> nbits[(unsigned char) n[1]] +
> nbits[(unsigned char) (n[1] >> 8)] +
> nbits[(unsigned char) (n[1] >> 16)] +
> nbits[(unsigned char) (n[1] >> 24)];
>}
>
>
>Method 3: I added the register, but it not make any difference
>
>int bitcount3(register unsigned64 b) {
> register unsigned32 n;
> register const unsigned64 a = b - ((b >> 1) & m1);
> register const unsigned64 c = (a & m2) + ((a >> 2) & m2);
> n = ((unsigned32) c) + ((unsigned32) (c >> 32));
> n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F);
> n = (n & 0xFFFF) + (n >> 16);
> n = (n & 0xFF) + (n >> 8);
> return n;
>}
>
>
>Method 4:
>int bitcount4(register unsigned64 b)
>{
> register unsigned long i1,i2;
>
> i1 = ((b & 0xAAAAAAAAL) >> 1) + (b & 0x55555555L);
> i1 = ((i1 & 0xCCCCCCCCL) >> 2) + (i1 & 0x33333333L);
> i1 = ((i1 & 0xF0F0F0F0L) >> 4) + (i1 & 0x0F0F0F0FL);
> i1 = ((i1 & 0xFF00FF00L) >> 8) + (i1 & 0x00FF00FFL);
> i1 = ((i1 & 0xFFFF0000L) >> 16) + (i1 & 0x0000FFFFL);
>
> i2 = (((b>>32) & 0xAAAAAAAAL) >> 1) + ((b>>32) & 0x55555555L);
> i2 = ((i2 & 0xCCCCCCCCL) >> 2) + (i2 & 0x33333333L);
> i2 = ((i2 & 0xF0F0F0F0L) >> 4) + (i2 & 0x0F0F0F0FL);
> i2 = ((i2 & 0xFF00FF00L) >> 8) + (i2 & 0x00FF00FFL);
> i2 = ((i2 & 0xFFFF0000L) >> 16) + (i2 & 0x0000FFFFL);
>
> return (int)(i1+i2);
>}
Hi all!
I followed the thread with great interest.
A fast bitcounting method is of great importance, not only for chessprograms, i
think.
For example the c++ standard library provides a class "bitset", with a method
"count" or so. It uses a table of 256 values (one for each possible char) and
adds the values up.
I am not that expert in hardware, but memory-throughput is a great issue for
modern processors. A cachehungry implementation will eat up also several
internal ressources.
Maybe implementations 3 or 4 are not so bad, regarding this background, but i am
only speculating...
Some time ago i reviewed the mentioned "HAKMEM"
(http://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html) and stumbled across the
following idea:
Let n be a natural number, written with cyphers a0,...,ak in base m
n = a0 + a1*m + a2*m^2 + ... + ak*m^k
Then (since m mod (m-1) = 1)
n mod (m-1) = (a0 + a1 + ... + ak) mod (m-1)
For example with n a 32bit register, you can add up all the chars modulo 255
by writing
n = n % 0xff;
That looks not very attractive, since there is apparently a division involved.
But its not true! Divisions and modulo-operations with constant divisors can be
emulated, using a fixpointlike approach. The famous gcc knows how to handle
these cases!
I tested it out, using the pentium cyclecounter:
Algorithm 3 without improvement: 44..45 cycles.
Algorithm 3 with improvement: 42..43 cycles.
Not much, but better!
Hope it helped.
Oliver Roese
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.