Author: Eugene Nalimov
Date: 21:59:58 01/22/02
Go up one level in this thread
On January 23, 2002 at 00:22:12, Robert Hyatt wrote:
>On January 22, 2002 at 12:22:15, Leen Ammeraal wrote:
>
>>Hi Miguel,
>>I tested Dann's version, LOWBIT, not your routine.
>>I forgot to write that I admire
>>this almost magical algorithm very much and that I would
>>be very pleased if someone could explain it.
>
>
>The idea is very simple. Test the end of the 64 bit value that has
>bits 32-63 with a bit scan instruction (which will give a result of
>0-31. Add 32 to this to adjust to 32-63. Now test the other end. If
>the bit scan finds no bits set, it will leave the dest register unchanged
>which leaves the result of your high-order end in there. If it does find
>a bit set (0-31) it will overwrite the destination and the result will be
>0-31.
>
>Works well. Not quite so easy to do for "hibit" as a branch is going to
>be necessary...
Unfortunately, Intel's documentation does not say "Unchanged if there is no bit
set". It says "Undefined". Maybe on the current CPUs it works, but I strongly
recommend against such tricks.
Eugene
>
>
>
>If your
>>experiments show that your version is better, please let
>>me (or rather, us) know. I remember that this assembly
>>routine was faster than the C(++) routine that I replaced
>>(a long time ago).
>>By the way, it is a long time since I wrote assembly
>>language myself, and I don't understand this routine
>>either. In particular, I find it miraculous that it
>>contains neither a loop nor a table-lookup. Any explanation
>>would be very welcome, preferably in a C equivalent,
>>if possible, just to show how it works.
>>
>>Leen
>>
>>
>>
>>On January 22, 2002 at 12:01:07, Miguel A. Ballicora wrote:
>>
>>>On January 22, 2002 at 11:23:36, Leen Ammeraal wrote:
>>>
>>>>I did an experiment with your function lowbit, adapting
>>>>it (by using a union of both an __int64 and a
>>>>struct of two ints, in view of 64 bit bitboards)
>>>>but it was slightly slower (with very little difference)
>>>>than the following assembly routine which you (Dann) also
>>>>provided:
>>>>
>>>>__forceinline
>>>>int FirstPiece(BITBOARD bits)
>>>>{
>>>>// Given is: bits != 0
>>>>__asm {
>>>>; 64-bit number to move into ECX:EBX, will never be zero!
>>>> mov ecx,dword ptr [bits+4]
>>>> mov ebx,dword ptr [bits]
>>>> bsf eax,ecx
>>>> add eax,32
>>>> bsf eax,ebx
>>>> }
>>>>}
>>>>
>>>>So I think I will stick to this assembly function.
>>>>Leen
>>>
>>>Hi Leen,
>>>
>>>I am a bit confused. You answer to Dann but you talk about the function lowbit()
>>>I posted a function lowbit and Dann posted a macro LOWBIT (by L.Kirby).
>>>Which one you say you tested and is slightly slower than the assembler version?
>>>
>>>If the difference is not high, maybe the C version of (LOWBIT or lowbit or
>>>whatever) could be optimized and or coded in assembler.
>>>
>>>Regards,
>>>Miguel
>>>PS: did you test lowbit2?
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.