Author: Anthony Cozzie
Date: 09:31:59 07/20/04
Go up one level in this thread
What do you think of the following C code: int bb_dot_product(bitboard a, unsigned char *weights) { bitboard t, t1, *_weights = weights; static bitboard table[256] = {correct translations, e.g. 0xFF -> 0xffffffff} //we count on the compiler to unroll this loop. for(i = 0; i < 8; i++, a ) { t = table[(a >> i*8) & 0xFF] & weights[i]; t1 = t; t << 8; sum += (t & 0x00FF00FF00FF00FF) + (t1 & 0x00FF00FF00FF00FF); } sum = (sum & 0x0000FFFF0000FFFF) + ((sum >> 16) & 0x0000FFFF0000FFFF); sum = ((sum >> 32) + sum & 0x00000000FFFFFFFF); return sum; } It has several advantages: Can use full 0-255 for each weight, the table does not have to be rotated, and there is no penalty for moving between the integer and MMX pipes. OTOH, this solution is also much less cache friendly, requiring maybe 2x the number of instructions and also needed 2KB of data cache. anthony
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.