Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Two questions: FirstOne() in Crafty and BSF/BSR

Author: Vincent Diepeveen

Date: 05:11:13 08/08/02

Go up one level in this thread


On August 08, 2002 at 07:32:12, Tim Foden wrote:

>On August 08, 2002 at 04:49:43, Russell Reagan wrote:
>
>>First question...
>>
>>In Crafty, there is a function FirstOne() which will return the least
>>significant bit set in a bitboard.
>
><ramble>
>If you look carefully you will see that the code contains bsr's and checks the
>high dword first, so Crafty's FirstOne() will return the index of the most
>significant set bit.  Useful to know if you are trying to use it in your
>program!  :)
></ramble>
>
>>FORCEINLINE int FirstOne(BITBOARD a) {
>>  __asm {
>>        bsr     edx, dword ptr a+4
>>        mov     eax, 31
>>        jnz     l1
>>        bsr     edx, dword ptr a
>>        mov     eax, 63
>>        jnz     l1
>>        mov     edx, -1
>>  l1:   sub     eax, edx
>>  }
>>}
>>
>>Now I've always heard that you should pass by pointer or reference if you are
>>dealing with a non-standard type, which I assume a 64-bit int would be on a
>>32-bit machine.
>
>I would say that this is in general a good rule (using const ref), but most of
>the (32bit) compilers used can deal with 64 bit values in registers (by using
>pairs of registers), so passing by value here (and inlining if possible) is
>probably OK.  (IMO anyway)
>
>>Is there any speed hit from passing by value in this function?
>
>I wouldn't be much, if any.
>
>>Or is the reason it works due to inlining the function?
>
>Inlining may remove any speed penalty, if the values end up being passed in
>registers.
>
>>
>>Second question...
>>
>>I have seen different ASM used in different engines to accomplish this same task
>>of determining the least significant set bit. This seems to be the shortest, but
>>I don't see it in any engines.
>>
>>bsf eax, dword ptr a+4
>>add eax, 32
>>bsf eax, dword ptr a
>
>This is very short, but in tests with my Athlon XP, it is no faster than the
>version with branches and checks for zeroed bitmaps.

that's of course a dumb test to do. you should try each time a random
chosen bitmap. A good way is to use a bunch of bitmaps (like 500)
and have a for loop: for f=1 to 500 try to get a bit from it.

Of course each bitmap with a different fill.

Then you'll see a clear difference. Factor 30 or so.

Best regards,
Vincent

>>The closest I see is Beowulf with:
>>
>>mov ecx,dword ptr [bits+4]
>>mov ebx,dword ptr [bits]
>>bsf eax,ecx
>>add eax,32
>>bsf eax,ebx
>>
>>Is the "problem" with this that you have to assume that the bitboard isn't equal
>>to zero?
>
>Not really.  It's just that bsf can be quite slow, and here it is always called
>twice.  So you can save some time on an Athlon by branching, and only calling it
>once.
>
>In Green Light I have tested a number of routines.  I generally use a
>FindBitAndRemoveIt type of routine, as in the vast majority of cases you not
>only want to find a bit, you also want to remove it.  Here are the best two
>routines I have found for this so far:
>
>inline Square	BitScanForwardRemove( UINT64& bitmap )
>{
>	__asm
>	{
>		mov		ebx, [bitmap]
>		mov		edx, 1
>
>		bsf		ecx, [ebx]
>		mov		eax, 0
>		jnz		found
>
>		bsf		ecx, [ebx + 4]
>		mov		eax, 32
>		lea		ebx, [ebx + 4]
>		jnz		found
>
>		mov		eax, XX
>		jmp		done
>
>	found:
>		shl		edx, cl
>		add		eax, ecx
>		xor		[ebx], edx
>
>	done:
>	}
>
>	// value returned in eax
>}
>
>or (if you prefer one with no branches, and no test for a zero bitmap):
>
>inline Square	BitScanForwardRemove( UINT64& bitmap )
>{
>	ASSERT( bitmap );
>	__asm
>	{
>		mov		ebx, [bitmap]
>		mov		eax, 0
>		cmp		dword ptr [ebx], eax
>		mov		edx, 1
>		setz	al
>		bsf		ecx, [ebx + eax * 4]
>		lea		ebx, [ebx + eax * 4]
>		shl		eax, 5
>		shl		edx, cl
>		add		eax, ecx
>		xor		[ebx], edx
>	}
>
>	// value returned in eax
>}
>
>
>>If so, why is that a problem? Where do you need to use FirstOne() where you
>>don't know if there are any bits set or not? It seems logical to me that the
>>since you don't need to verify that the bitboard is not null most of the time,
>>it is redundant and only wastes time. For example, if your FirstOne() function
>>tests to make sure that the bitboard isn't null, you are doing extra work in the
>>case of generating legal moves:
>
>Except that the branch that runs if the bitmap is zero will almost always be
>predicted correctly on modern processors.  Again, in tests on my Athlon, I found
>my first routine above to result in the same speed whether it checked for the
>zero case or not.  So I don't think it is a time issue.  It uses some more
>bytes, but you could of course have 2 versions of the routine, one that checks
>and one that doesn't.
>
>Cheers, Tim.
>
>>while (whiteKnights) {                  // Make sure whiteKnights isn't null
>>    int index = FirstOne(whiteKnights); // Make sure _AGAIN_
>>    DoKnightStuff(index);
>>    Clear(whiteKnights,index);
>>}
>>
>>Thanks,
>>Russell



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.