Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: The fastest Find_LSB I can get (on 32 hardware).

Author: Gerd Isenberg

Date: 01:39:11 02/04/04

Go up one level in this thread


On February 03, 2004 at 23:34:17, Mathieu Pagé wrote:

>Hi, I past the day trying different algorithm to improve my Find_LSB functions
>and endded with one that make my engine 7.67% faster than my old one.
>
>I think the old one is one created by E. Nalimov. it consist of a binary search
>of the LSB and finally with a 8bits lookup table.
>
>My new one use the asm instruction bsf.
>
><code language="C++">
>
>struct HiLo
>{
>   unsigned int Lo;
>   unsigned int Hi;
>};
>
>unsigned int __declspec(naked) __fastcall findFirstBitTrue(unsigned int z)
>{
>   __asm
>   {
>      bsf eax,ecx
>      ret
>   }
>}
>
>int GetLSB(BitBoard& bit)
>{
>   register int iPos;
>
>   if (reinterpret_cast< HiLo* >(&bit)->Lo != 0)
>   {
>      iPos = LSB_32(reinterpret_cast< HiLo* >(&bit)->Lo);
>   }
>   else
>   {
>      iPos = LSB_32(reinterpret_cast< HiLo* >(&bit)->Hi) + 32;
>   }
>
>   bit = (bit & (bit - 1));
>   return iPos;
>}
>
></code>
>
>The GetLSB function return the position of the LSB of a BitBoard (unsigned
>__int64).
>
>This code will only compile under msvc due to the asm , __fastcall and
>__declspec.
>
>It is the fastest code I could come with. Another version without the
>"__declspec(naked) __fastcall" (inline instead) was a little bit slower making
>my engine about 1.7 % slower.
>
>The "reinterpret_cast< HiLo* >(&bit)->Lo" thing is used to access the high and
>low part of the Bitboard without using any 64 bit operator (& or >>). I have not
>compile any data about how faster it is but I tested it and it is faster than a
>thing like : "bit >> 32" or "bit & 0xffffffff".
>
>I post it here for two reason. Maybe it could be used by someone and maybe
>someone have a still faster implementation.
>
>If you think of anything else I could test, please let me know.

There are "hundreds" of (64-bit) bitscans in the archives, based on bsf/bsr,
lookup, genius folding tricks (Walter Faxon), De Bruijn multiplication, float
conversion and more...

Your approach is fine, but the drawback with "naked" for this very small
subroutine is relative expensive call/ret overhead, considering the small body.
But inlining with ms inline asm contradicts fastcall, therefore it requires
parameter passing via memory. Even if the word is already inside a register,
there is an additional store/load with msc inline assembly. GCC inline assemby
is smarter - and probably one reason why inline assembly is not longer possible
with latest ms-compiler. There are "assembler like" intrinsics for that, leaving
register allocation and further optimization to the compiler - similar to GCC
inline assembly.

In my current inlined asm routine for traversing bitboards i found the best
compromise in passing a bitboard reference/pointer with reset the found bit.

Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.