Author: Anthony Cozzie
Date: 09:39:17 07/20/04
Go up one level in this thread
On July 20, 2004 at 12:31:59, Anthony Cozzie wrote: >What do you think of the following C code: > >int bb_dot_product(bitboard a, unsigned char *weights) >{ > bitboard t, t1, *_weights = weights; > static bitboard table[256] = {correct translations, e.g. 0xFF -> 0xffffffff} > > //we count on the compiler to unroll this loop. > for(i = 0; i < 8; i++, a ) { > t = table[(a >> i*8) & 0xFF] & weights[i]; > t1 = t; > t << 8; > sum += (t & 0x00FF00FF00FF00FF) + (t1 & 0x00FF00FF00FF00FF); > } > > sum = (sum & 0x0000FFFF0000FFFF) + ((sum >> 16) & 0x0000FFFF0000FFFF); > sum = ((sum >> 32) + sum & 0x00000000FFFFFFFF); > return sum; >} > >It has several advantages: Can use full 0-255 for each weight, the table does >not have to be rotated, and there is no penalty for moving between the integer >and MMX pipes. > >OTOH, this solution is also much less cache friendly, requiring maybe 2x the >number of instructions and also needed 2KB of data cache. > >anthony Could someone with a 64-bit MSVC compile this code and post the assembly? (and if they were really nice, benchmark it? :) anthony
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.