Author: Vincent Diepeveen
Date: 05:11:13 08/08/02
Go up one level in this thread
On August 08, 2002 at 07:32:12, Tim Foden wrote:
>On August 08, 2002 at 04:49:43, Russell Reagan wrote:
>
>>First question...
>>
>>In Crafty, there is a function FirstOne() which will return the least
>>significant bit set in a bitboard.
>
><ramble>
>If you look carefully you will see that the code contains bsr's and checks the
>high dword first, so Crafty's FirstOne() will return the index of the most
>significant set bit. Useful to know if you are trying to use it in your
>program! :)
></ramble>
>
>>FORCEINLINE int FirstOne(BITBOARD a) {
>> __asm {
>> bsr edx, dword ptr a+4
>> mov eax, 31
>> jnz l1
>> bsr edx, dword ptr a
>> mov eax, 63
>> jnz l1
>> mov edx, -1
>> l1: sub eax, edx
>> }
>>}
>>
>>Now I've always heard that you should pass by pointer or reference if you are
>>dealing with a non-standard type, which I assume a 64-bit int would be on a
>>32-bit machine.
>
>I would say that this is in general a good rule (using const ref), but most of
>the (32bit) compilers used can deal with 64 bit values in registers (by using
>pairs of registers), so passing by value here (and inlining if possible) is
>probably OK. (IMO anyway)
>
>>Is there any speed hit from passing by value in this function?
>
>I wouldn't be much, if any.
>
>>Or is the reason it works due to inlining the function?
>
>Inlining may remove any speed penalty, if the values end up being passed in
>registers.
>
>>
>>Second question...
>>
>>I have seen different ASM used in different engines to accomplish this same task
>>of determining the least significant set bit. This seems to be the shortest, but
>>I don't see it in any engines.
>>
>>bsf eax, dword ptr a+4
>>add eax, 32
>>bsf eax, dword ptr a
>
>This is very short, but in tests with my Athlon XP, it is no faster than the
>version with branches and checks for zeroed bitmaps.
that's of course a dumb test to do. you should try each time a random
chosen bitmap. A good way is to use a bunch of bitmaps (like 500)
and have a for loop: for f=1 to 500 try to get a bit from it.
Of course each bitmap with a different fill.
Then you'll see a clear difference. Factor 30 or so.
Best regards,
Vincent
>>The closest I see is Beowulf with:
>>
>>mov ecx,dword ptr [bits+4]
>>mov ebx,dword ptr [bits]
>>bsf eax,ecx
>>add eax,32
>>bsf eax,ebx
>>
>>Is the "problem" with this that you have to assume that the bitboard isn't equal
>>to zero?
>
>Not really. It's just that bsf can be quite slow, and here it is always called
>twice. So you can save some time on an Athlon by branching, and only calling it
>once.
>
>In Green Light I have tested a number of routines. I generally use a
>FindBitAndRemoveIt type of routine, as in the vast majority of cases you not
>only want to find a bit, you also want to remove it. Here are the best two
>routines I have found for this so far:
>
>inline Square BitScanForwardRemove( UINT64& bitmap )
>{
> __asm
> {
> mov ebx, [bitmap]
> mov edx, 1
>
> bsf ecx, [ebx]
> mov eax, 0
> jnz found
>
> bsf ecx, [ebx + 4]
> mov eax, 32
> lea ebx, [ebx + 4]
> jnz found
>
> mov eax, XX
> jmp done
>
> found:
> shl edx, cl
> add eax, ecx
> xor [ebx], edx
>
> done:
> }
>
> // value returned in eax
>}
>
>or (if you prefer one with no branches, and no test for a zero bitmap):
>
>inline Square BitScanForwardRemove( UINT64& bitmap )
>{
> ASSERT( bitmap );
> __asm
> {
> mov ebx, [bitmap]
> mov eax, 0
> cmp dword ptr [ebx], eax
> mov edx, 1
> setz al
> bsf ecx, [ebx + eax * 4]
> lea ebx, [ebx + eax * 4]
> shl eax, 5
> shl edx, cl
> add eax, ecx
> xor [ebx], edx
> }
>
> // value returned in eax
>}
>
>
>>If so, why is that a problem? Where do you need to use FirstOne() where you
>>don't know if there are any bits set or not? It seems logical to me that the
>>since you don't need to verify that the bitboard is not null most of the time,
>>it is redundant and only wastes time. For example, if your FirstOne() function
>>tests to make sure that the bitboard isn't null, you are doing extra work in the
>>case of generating legal moves:
>
>Except that the branch that runs if the bitmap is zero will almost always be
>predicted correctly on modern processors. Again, in tests on my Athlon, I found
>my first routine above to result in the same speed whether it checked for the
>zero case or not. So I don't think it is a time issue. It uses some more
>bytes, but you could of course have 2 versions of the routine, one that checks
>and one that doesn't.
>
>Cheers, Tim.
>
>>while (whiteKnights) { // Make sure whiteKnights isn't null
>> int index = FirstOne(whiteKnights); // Make sure _AGAIN_
>> DoKnightStuff(index);
>> Clear(whiteKnights,index);
>>}
>>
>>Thanks,
>>Russell
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.