Computer Chess Club Archives

Search

Terms

Messages

Subject: PopCount optimization

Author: milix

Date: 07:31:22 03/11/04

Hello to everyone.
I'm trying to optimize various parts of my bitboard engine and I found an
optimization for my popcount() routine. I was using the routine:

int popcount_1(const bitboard bb) {
   register int cnt=0;
   register bitboard a=bb;
   while (a) {
      cnt += bytecount[a & 0xff];
      a >>= 8;
   }
   return cnt;
}

where bytecount is a looup table for byte counts.

I changed this routine a bit to

int popcount_2(const bitboard bb) {
   register int cnt=0;
   unsigned long a1 = *((unsigned long*)&bb);
   unsigned long a2 = *(((unsigned long*)&bb)+1);
   while (a1) {
      cnt += bytecount[a1 & 0xff];
      a1 >>= 8;
   }
   while (a2) {
      cnt += bytecount[a2 & 0xff];
      a2 >>= 8;
   }
   return cnt;
}

and the routine popcount_2 was 2x times faster than popcount_1. It is even
faster than the assembly routine provided by AMD for popcounts. I tested the
routines using 1000 randomly generated bitboards.
Do I miss something here or it is an improvement? Have anyone else tried a
similar idea?

My computer specs are:
AMD Athlon XP 1600+
512 MB RAM

Re: PopCount optimization Gerd Isenberg 08:25:49 03/11/04
Re: PopCount optimization Anthony Cozzie 08:05:30 03/11/04
Re: PopCount optimization Richard Pijl 07:59:28 03/11/04
- Re: PopCount optimization Dann Corbit 15:56:56 03/11/04

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.