Author: Bo Persson
Date: 14:00:25 08/08/02
Go up one level in this thread
On August 08, 2002 at 13:07:41, Miguel A. Ballicora wrote:
>On August 08, 2002 at 04:49:43, Russell Reagan wrote:
>
>>First question...
>>
>>In Crafty, there is a function FirstOne() which will return the least
>>significant bit set in a bitboard.
>>
>>FORCEINLINE int FirstOne(BITBOARD a) {
>> __asm {
>> bsr edx, dword ptr a+4
>> mov eax, 31
>> jnz l1
>> bsr edx, dword ptr a
>> mov eax, 63
>> jnz l1
>> mov edx, -1
>> l1: sub eax, edx
>> }
>>}
>>
>>Now I've always heard that you should pass by pointer or reference if you are
>>dealing with a non-standard type, which I assume a 64-bit int would be on a
>>32-bit machine. Is there any speed hit from passing by value in this function?
>>Or is the reason it works due to inlining the function?
>>
>>Second question...
>>
>>I have seen different ASM used in different engines to accomplish this same task
>>of determining the least significant set bit. This seems to be the shortest, but
>>I don't see it in any engines.
>>
>>bsf eax, dword ptr a+4
>>add eax, 32
>>bsf eax, dword ptr a
>>
>>The closest I see is Beowulf with:
>>
>>mov ecx,dword ptr [bits+4]
>>mov ebx,dword ptr [bits]
>>bsf eax,ecx
>>add eax,32
>>bsf eax,ebx
>>
>>Is the "problem" with this that you have to assume that the bitboard isn't equal
>>to zero?
>>
>>If so, why is that a problem? Where do you need to use FirstOne() where you
>>don't know if there are any bits set or not? It seems logical to me that the
>>since you don't need to verify that the bitboard is not null most of the time,
>>it is redundant and only wastes time. For example, if your FirstOne() function
>>tests to make sure that the bitboard isn't null, you are doing extra work in the
>>case of generating legal moves:
>>
>>while (whiteKnights) { // Make sure whiteKnights isn't null
>> int index = FirstOne(whiteKnights); // Make sure _AGAIN_
>> DoKnightStuff(index);
>> Clear(whiteKnights,index);
>>}
>
>Should not this setup be faster on a 32 bit machine? This takes the branches out
>of the loop, which is a common optimization procedure, is not it?
>Maybe it does not matter for almost empty bitboards.
>
>typedef unsigned int uint32_t;
>
>int FirstOne32(uint32_t x)
>{
> /* return a FirstOne on a 32 bit integer; no branches here */
>
>}
>
>uint32_t whiteKnights_L = GetLower32 (whiteKnights);
>uint32_t whiteKnights_H = GetHigher32 (whiteKnights);
>
>while (whiteKnights_L) {
> int index = FirstOne32(whiteKnights_L);
> DoKnightStuff(index);
> Clear(whiteKnights,index);
>}
>while (whiteKnights_H) {
> int index = 32 + FirstOne32(whiteKnights_H);
> DoKnightStuff(index);
> Clear(whiteKnights,index);
>}
>
It really does the same thing! :-)
After inlining, Bob's
while (whiteKnights)
will load both halves of the bitmap into two registers, and basically OR them
together to see if the result is 0. If not, FirstOne will find its parameter
already available, and can test it *really* fast!
Duplicating the code, like in your case, might save a jump, but also increases
the risk of a cache miss for the code, because it is larger.
Bo Persson
bop2@telia.com
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.