Author: Matt Taylor
Date: 19:44:43 07/16/03
Go up one level in this thread
On July 10, 2003 at 21:05:09, Russell Reagan wrote: >Thanks Gerd. This method seems approximately equal to the bsf assembler method >in speed on my Athlon (maybe a hair slower). Maybe this will be the fastest on a >64-bit cpu? Full time to compute is 10 cycles assuming cache hits. Time to bsf is 8 cycles + setup. On an Opteron, it is possible that this is faster than the built-in 64-bit bsf instruction. Multiplication is faster and bsf should be 1 cycle slower, but I don't have access to all the details. I have a friend from AMD who happens to have one, and he's promised me that (eventually, time permitting) I can go play around with it. We'll see... I'm glad that someone was able to identify the algorithm. I knew I had seen it somewhere before. It is beyond my math background. Note that to do a 64-bit bitscan I am using 32-bit multiplication. There are multiple reasons for this. The foremost is that a 32-bit multiply is fast on our 32-bit CPUs and even faster on an Opteron. Here's some code: int tbl[64] = { 0, 31, 4, 33, 60, 15, 12, 34, 61, 25, 51, 10, 56, 20, 22, 35, 62, 30, 3, 54, 52, 24, 42, 19, 57, 29, 2, 44, 47, 28, 1, 36, 63, 32, 59, 5, 6, 50, 55, 7, 16, 53, 13, 41, 8, 43, 46, 17, 26, 58, 49, 14, 11, 40, 9, 45, 21, 48, 39, 23, 18, 38, 37, 27}; int lsb(uint64 *bb) { uint64 bit = *bb; uint32 lo, high; bit = bit & -bit; *bb ^= bit; bit--; lo = (uint32)(bit >> 32); high = (uint32)(bit >> 0); return tbl[((lo ^ high) * 0x78291ACF) >> 26]; } Happy computing! -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.