Author: milix
Date: 07:31:22 03/11/04
Hello to everyone. I'm trying to optimize various parts of my bitboard engine and I found an optimization for my popcount() routine. I was using the routine: int popcount_1(const bitboard bb) { register int cnt=0; register bitboard a=bb; while (a) { cnt += bytecount[a & 0xff]; a >>= 8; } return cnt; } where bytecount is a looup table for byte counts. I changed this routine a bit to int popcount_2(const bitboard bb) { register int cnt=0; unsigned long a1 = *((unsigned long*)&bb); unsigned long a2 = *(((unsigned long*)&bb)+1); while (a1) { cnt += bytecount[a1 & 0xff]; a1 >>= 8; } while (a2) { cnt += bytecount[a2 & 0xff]; a2 >>= 8; } return cnt; } and the routine popcount_2 was 2x times faster than popcount_1. It is even faster than the assembly routine provided by AMD for popcounts. I tested the routines using 1000 randomly generated bitboards. Do I miss something here or it is an improvement? Have anyone else tried a similar idea? My computer specs are: AMD Athlon XP 1600+ 512 MB RAM
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.