# Computer Chess Club Archives

## Messages

### Subject: Re: Matt Taylor's magic de Bruijn Constant

Author: Bas Hamstra

Date: 10:15:51 07/13/03

Go up one level in this thread

```Hi Gerd,

First of all: the 25 should be 255 indeed.

On July 13, 2003 at 12:51:45, Gerd Isenberg wrote:

>On July 13, 2003 at 06:57:48, Bas Hamstra wrote:
>
>>Coicidentally I am working a bit on LastOne too right now. I also have a Athlon
>>XP (like you I believe). So BSF isn't too fast. The question is: can you
>>outperform it? At first it seems so, in a zillion times loop the following code
>>appeared to be nearly a factor 3 faster than the BSF routine below:
>>
>>const char Tabel[64] =
>>		{	0, 0, 0,15, 0, 1,28, 0,16, 0, 0, 0, 2,21,29, 0,
>>			0, 0,19,17,10, 0,12, 0, 0, 3, 0, 6, 0,22,30, 0,
>>			14, 0,27, 0, 0, 0,20, 0,18, 9,11, 0, 5, 0, 0,13,
>>			26, 0, 0, 8, 0, 4, 0,25, 0, 7,24, 0,23, 0,31, 0
>>		};
>>
>>
>>__forceinline int LastOne2(unsigned __int64 Bits)
>>
>>{	unsigned int *p = (unsigned int*) &Bits;
>>	unsigned int Short;
>>
>>	if(p[0])
>>	{	Short = p[0];
>>		Short = Short ^ (Short-1);
>>		Short *= 7*255*255*25;  // ?? 25 or 255 doesn't matter?
>>		Short >>= 26;
>>		return Tabel[Short&63];
>>	}
>>	Short = p[1];
>>	Short = Short ^ (Short-1);
>>	Short *= 7*255*255*255;
>>	Short >>= 26;
>>	return Tabel[Short&63]+32;
>>}
>>
>>VC produces very efficient asm code for it, as far as I can judge. However how
>>about the overall performance in the program? It seems to be noticably slower
>>than plain BSF, like below:
>>
>>__forceinline int LastOne(BB M)
>>{   __asm
>>    {       mov EAX, dword ptr [M]
>>            XOR EDX, EDX
>>            cmp EAX, 0
>>            jnz L2
>>            mov EAX, dword ptr [M+4]
>>        L2: bsf EAX, EAX
>>    }
>>}
>>
>>Both routines have an extra branch to test the first 32 bits, so that is not the
>>problem!?
>>
>
>Hi Bas,
>
>LastOne - FirstOne, Ok it depends on the starting point 0 or 63.
>What about nullTo63One? Even with LS1B and MS1B is confusion.
>
>Nice constants, but they map 32-bit to a range of 64 or 60 - about the double
>space as necessary ;-) Ahh, you use signed char for table, may be a additional
>and 0xff or movsx instead of mov. Why not using de Bruijn constants, that map
>to a range of 32?

>A loop test is interesting, but code size matters a lot in programs that often
>inlines "small" functions. Also memory lookup seems to be an issue here - even
>with this cache friendly size.

So it seems, yes. And it makes me suspect that a lot of other frequently used
simple lookups suffer the same (10 clock or something) penalty. You start to
wonder how fast a program could run if program+hash would fit in cache. I have
seen a couple of LastBit routines that are very worried about avoiding branches,
but then need a table lookup. Doesn't make sense then?!

>My intention is not to outperform bsf anymore - but to use a more bitboard
>metric approach in "future" 64-bit IsiChess.
>
>1. Try to avoid traversing bitboards at all.
>
>2. If traversing bitboards, use x & -x to isolate single bits - and eg. for
>moves use a fast hashkey function, to map move from/to bitboards to an
>appropriate value-range.

Interesting, but not easy I would think. Hashing ok (did you solve that problem
actually?), but at some point you will have to know where the pieces are. Take
Swap() for instance. Direct attacks in Swap() you could do without LastBit, but
how about discovered ones? You would have to know where the pieces are.

Best regards,
Bas.

```