# Computer Chess Club Archives

## Messages

### Subject: Re: Matt Taylor's magic de Bruijn Constant

Author: Gerd Isenberg

Date: 09:51:45 07/13/03

Go up one level in this thread

```On July 13, 2003 at 06:57:48, Bas Hamstra wrote:

>Coicidentally I am working a bit on LastOne too right now. I also have a Athlon
>XP (like you I believe). So BSF isn't too fast. The question is: can you
>outperform it? At first it seems so, in a zillion times loop the following code
>appeared to be nearly a factor 3 faster than the BSF routine below:
>
>const char Tabel[64] =
>		{	0, 0, 0,15, 0, 1,28, 0,16, 0, 0, 0, 2,21,29, 0,
>			0, 0,19,17,10, 0,12, 0, 0, 3, 0, 6, 0,22,30, 0,
>			14, 0,27, 0, 0, 0,20, 0,18, 9,11, 0, 5, 0, 0,13,
>			26, 0, 0, 8, 0, 4, 0,25, 0, 7,24, 0,23, 0,31, 0
>		};
>
>
>__forceinline int LastOne2(unsigned __int64 Bits)
>
>{	unsigned int *p = (unsigned int*) &Bits;
>	unsigned int Short;
>
>	if(p[0])
>	{	Short = p[0];
>		Short = Short ^ (Short-1);
>		Short *= 7*255*255*25;  // ?? 25 or 255 doesn't matter?
>		Short >>= 26;
>		return Tabel[Short&63];
>	}
>	Short = p[1];
>	Short = Short ^ (Short-1);
>	Short *= 7*255*255*255;
>	Short >>= 26;
>	return Tabel[Short&63]+32;
>}
>
>VC produces very efficient asm code for it, as far as I can judge. However how
>about the overall performance in the program? It seems to be noticably slower
>than plain BSF, like below:
>
>__forceinline int LastOne(BB M)
>{   __asm
>    {       mov EAX, dword ptr [M]
>            XOR EDX, EDX
>            cmp EAX, 0
>            jnz L2
>            mov EAX, dword ptr [M+4]
>        L2: bsf EAX, EAX
>    }
>}
>
>Both routines have an extra branch to test the first 32 bits, so that is not the
>problem!?
>

Hi Bas,

LastOne - FirstOne, Ok it depends on the starting point 0 or 63.
What about nullTo63One? Even with LS1B and MS1B is confusion.

Nice constants, but they map 32-bit to a range of 64 or 60 - about the double
space as necessary ;-) Ahh, you use signed char for table, may be a additional
and 0xff or movsx instead of mov. Why not using de Bruijn constants, that map to
a range of 32?

A loop test is interesting, but code size matters a lot in programs that often
inlines "small" functions. Also memory lookup seems to be an issue here - even
with this cache friendly size.

My intention is not to outperform bsf anymore - but to use a more bitboard
metric approach in "future" 64-bit IsiChess.

1. Try to avoid traversing bitboards at all.

2. If traversing bitboards, use x & -x to isolate single bits - and eg. for
moves use a fast hashkey function, to map move from/to bitboards to an
appropriate value-range.

Regards,
Gerd

```