Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: PopCount optimization

Author: Dann Corbit

Date: 15:56:56 03/11/04

Go up one level in this thread


On March 11, 2004 at 10:59:28, Richard Pijl wrote:

>On March 11, 2004 at 10:31:22, milix wrote:
>
>>Hello to everyone.
>>I'm trying to optimize various parts of my bitboard engine and I found an
>>optimization for my popcount() routine. I was using the routine:
>>
>>int popcount_1(const bitboard bb) {
>>   register int cnt=0;
>>   register bitboard a=bb;
>>   while (a) {
>>      cnt += bytecount[a & 0xff];
>>      a >>= 8;
>>   }
>>   return cnt;
>>}
>
>Here you are shifting a 64 bits integer on a 32 bits architecture. Basically
>you're doing a lot of working shifting bits from one word to the other.
>Richard.

Also, there is a needless double copy of the passed in bitboard.
In C (and C++ unless you use references) everything is passed by value.
Hence, a const on a non-pointer paramter is always useless.  You are already
working on a copy.  Also, it is pointless to decorate your programs with the
register hint.  This routine does the same thing as his first routine but
faster:

int popcount_1(bitboard a) {
   int cnt=0;
   while (a) {
      cnt += bytecount[a & 0xff];
      a >>= 8;
   }
   return cnt;
}

There is no damage to 'bitboard a' in the calling program, since it is passed by
value (a copy is made).

An aside on the register hint:
Don't do it.  Not only does it not make your program faster (unless you have the
world's lamest compiler) but it actually hinders your program.

For instance, this is not legal C or C++:

register int foo;
int *bar = &foo; /* You cannot take the address of a register... */

>>where bytecount is a looup table for byte counts.
>>
>>I changed this routine a bit to
>>
>>int popcount_2(const bitboard bb) {
>>   register int cnt=0;
>>   unsigned long a1 = *((unsigned long*)&bb);
>>   unsigned long a2 = *(((unsigned long*)&bb)+1);
>>   while (a1) {
>>      cnt += bytecount[a1 & 0xff];
>>      a1 >>= 8;
>>   }
>>   while (a2) {
>>      cnt += bytecount[a2 & 0xff];
>>      a2 >>= 8;
>>   }
>>   return cnt;
>>}
>>
>>and the routine popcount_2 was 2x times faster than popcount_1. It is even
>>faster than the assembly routine provided by AMD for popcounts. I tested the
>>routines using 1000 randomly generated bitboards.
>>Do I miss something here or it is an improvement? Have anyone else tried a
>>similar idea?
>>
>>My computer specs are:
>>AMD Athlon XP 1600+
>>512 MB RAM



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.