Author: Walter Faxon
Date: 12:47:40 12/22/05
Go up one level in this thread
On December 22, 2005 at 14:25:24, Gerd Isenberg wrote: >Hi, > >for all bitScan-collectors - just another one ;-) > >It is not particular small, but portable, branchless, has cheap instructions and >no memory lookup. It seems faster than those based on double conversion (inlined >24 versus 40 amd64 cycles (32-bit mode, measured by rdtsc). > >unsigned int bitScanReverse(BitBoard bb) >{ > unsigned int l, h, i, m; > h = (unsigned int)(bb >> 32); > l = (unsigned int) bb; > m = h != 0; > i = m << 5; // 0|32 > l = (h & -m) | (l & m-1); > m = (l > 0xffff) << 4; // 0|16 > i += m; > l >>= m; > m = ((0xff-l)>>16) & 8; // 0|8 > i += m; > l >>= m; > m = ((0x0f-l)>> 8) & 4; // 0|4 > l >>= m; > return i + m + ((0xffffaa50u >> (2*l)) & 3); >} > >It is based on the conditional "divide and conquer" one, posted here by Eugene >Nalimov. The folding is implemented branchless, eg. the word/byte folding: > >if (l & 0xff00) { > i += 8; > l >>= 8; >} > >m = (l > 0xff) << 3; >i += m; >l >>= m; > >or > >m = ((0xff-l)>>16) & 8; >i += m; >l >>= m; > >While Eugene used a final lookup[foldedByte] >or with only 16-bit folding lookup[foldedWord], >here the final pre-lookup step is folding from byte to nibble for an >in-register-lookup with 2bit[16], as suggested by Paul Womack in comp.arch on >May 15, 2000. > >Gerd Very clever! Thanks, Gerd! -Walter
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.