Author: Gerd Isenberg
Date: 13:23:30 01/13/05
Go up one level in this thread
On January 13, 2005 at 14:40:50, Aart J.C. Bik wrote: > >The following attempt for the 64-bit version will vectorize, but I see no >speedup over the sequential compilation of the same implementation (it is faster >than the original source code with shift, however): > >unsigned int bits32[32]; /* precompute shifts */ > >int dotProduct64(unsigned __int64 bb, unsigned char weight[]) >{ > int i; > int sum = 0; > unsigned int b1 = bb; > unsigned int b2 = bb>>32; >#pragma ivdep >#pragma vector aligned > for (i=0; i < 32; i++) { > if (b1 & bits32[i]) sum += weight[i]; > if (b2 & bits32[i]) sum += weight[i+32]; > } > return sum; >} Yes - i guess you are ambitious on 128-bit alus as well. 38 amd64 cycles to beat ;-) Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.