Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: There is huge potential to improve further

Author: Matt Taylor

Date: 19:44:43 07/16/03

Go up one level in this thread


On July 10, 2003 at 21:05:09, Russell Reagan wrote:

>Thanks Gerd. This method seems approximately equal to the bsf assembler method
>in speed on my Athlon (maybe a hair slower). Maybe this will be the fastest on a
>64-bit cpu?

Full time to compute is 10 cycles assuming cache hits. Time to bsf is 8 cycles +
setup. On an Opteron, it is possible that this is faster than the built-in
64-bit bsf instruction. Multiplication is faster and bsf should be 1 cycle
slower, but I don't have access to all the details. I have a friend from AMD who
happens to have one, and he's promised me that (eventually, time permitting) I
can go play around with it. We'll see...

I'm glad that someone was able to identify the algorithm. I knew I had seen it
somewhere before. It is beyond my math background.

Note that to do a 64-bit bitscan I am using 32-bit multiplication. There are
multiple reasons for this. The foremost is that a 32-bit multiply is fast on our
32-bit CPUs and even faster on an Opteron.

Here's some code:

int tbl[64] =
{ 0, 31,  4, 33, 60, 15, 12, 34, 61, 25, 51, 10, 56, 20, 22, 35,
 62, 30,  3, 54, 52, 24, 42, 19, 57, 29,  2, 44, 47, 28,  1, 36,
 63, 32, 59,  5,  6, 50, 55,  7, 16, 53, 13, 41,  8, 43, 46, 17,
 26, 58, 49, 14, 11, 40,  9, 45, 21, 48, 39, 23, 18, 38, 37, 27};

int lsb(uint64 *bb)
{
	uint64 bit = *bb;
	uint32 lo, high;

	bit = bit & -bit;
	*bb ^= bit;

	bit--;

	lo = (uint32)(bit >> 32);
	high = (uint32)(bit >> 0);

	return tbl[((lo ^ high) * 0x78291ACF) >> 26];
}

Happy computing!

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.