Author: Dann Corbit
Date: 12:32:56 12/22/05
Go up one level in this thread
On December 22, 2005 at 14:25:24, Gerd Isenberg wrote:
>Hi,
>
>for all bitScan-collectors - just another one ;-)
>
>It is not particular small, but portable, branchless, has cheap instructions and
>no memory lookup. It seems faster than those based on double conversion (inlined
>24 versus 40 amd64 cycles (32-bit mode, measured by rdtsc).
>
>unsigned int bitScanReverse(BitBoard bb)
>{
> unsigned int l, h, i, m;
> h = (unsigned int)(bb >> 32);
> l = (unsigned int) bb;
> m = h != 0;
> i = m << 5; // 0|32
> l = (h & -m) | (l & m-1);
> m = (l > 0xffff) << 4; // 0|16
> i += m;
> l >>= m;
> m = ((0xff-l)>>16) & 8; // 0|8
> i += m;
> l >>= m;
> m = ((0x0f-l)>> 8) & 4; // 0|4
> l >>= m;
> return i + m + ((0xffffaa50u >> (2*l)) & 3);
>}
>
>It is based on the conditional "divide and conquer" one, posted here by Eugene
>Nalimov. The folding is implemented branchless, eg. the word/byte folding:
>
>if (l & 0xff00) {
> i += 8;
> l >>= 8;
>}
>
>m = (l > 0xff) << 3;
>i += m;
>l >>= m;
>
>or
>
>m = ((0xff-l)>>16) & 8;
>i += m;
>l >>= m;
>
>While Eugene used a final lookup[foldedByte]
>or with only 16-bit folding lookup[foldedWord],
>here the final pre-lookup step is folding from byte to nibble for an
>in-register-lookup with 2bit[16], as suggested by Paul Womack in comp.arch on
>May 15, 2000.
>
>Gerd
Very pretty.
P.S.
I always enjoy your posts.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.