Author: Frederik Tack
Date: 04:47:03 12/23/05
Go up one level in this thread
On December 22, 2005 at 16:29:40, Gerd Isenberg wrote:
>>Is this faster than Bob's bit routine in Crafty?
>>
>>Anyone compared?
>
>I guess not - it might be a nice portable backup.
>
>Iirc Crafty's FirstOne is based on leading zero count, so this bsr replacement
>is used very often with 63-bitScanReverse or so. It should be inlined for speed
>issues - but codesize will grow. Since the bitScan-intrinsic produces only one
>64-bit instruction - bsr reg64,reg64 which are only a 4 bytes instead of 100 or
>so for the C-routine. The codesize will explode a bit - if used often - which is
>likely in Crafty.
>
>Otoh the bsr reg64,reg64 is 9 cycles vector path. The let say 24 cycles of the
>C-routine may be scheduled with other branchless but more memory transfer
>instructions around - to improve ipc of the otherwise very dependent instruction
>chain with about only one instruction per cycle.
>
>If the routine is used rarely as in my case, (i traverse the boards the other
>way around) - it might be usefull.
>
>Out of curiosity - in my 32-bit program the routine with takes the largest
>number of 64 cycles, behaves best in my program - negligible and may be noise -
>however:
>
>// 64 cycles
>unsigned int bitScanReverse(BitBoard bb) {
> union {
> double d;
> struct {
> unsigned int mantissal : 32;
> unsigned int mantissah : 20;
> unsigned int exponent : 11;
> unsigned int sign : 1;
> };
> } ud;
> ud.d = (double)(bb & ~(bb >> 32));
> return ud.exponent - 1023;
>}
>
>the worst one:
>
>// 30 cycles - implies parameter via memory even if inlined
>unsigned int bitScanReverse(BitBoard bb) {
> __asm {
> bsr eax,[bb]
> bsr eax,[bb+4]
> setnz dl
> shl dl, 5
> add al, dl
> }
>}
>
>Gerd
About the assembler routine being slower :
I found that using a call by reference instead of a call by value speeds up the
function in Delphi a great deal because you get the adress of the bitboard
directly in the EAX register, don't know about C though but perhaps it will be
the same or perhaps its just a thing of the Delphi compiler.
I also try to optimize performance by doing the first bitscan on the side of the
board where the bit is most likely to occur. For example if i have to find the
white king, the piece is most likely to be found on the white side of the board.
That way BSF/BSR is executed only once in most cases.
Here's the code
function BitScanForwardBlack(var Bitboard: TBitboard): TSquare;
// Returns the first occupied square of a bitboard,
// scanning forward and starting with the black side of the board
asm
bsf eax, [eax]
jnz @End
bsf eax, [eax+4]
jz @End
add eax, 32
@End:
end;
function BitScanReverseWhite(var Bitboard: TBitboard): TSquare;
// Returns the first occupied square of a bitboard,
// scanning backward and starting with the white side of the board
asm
bsr eax, [eax+4]
jz @Black
add eax, 32
jmp @End
@Black:
bsr eax, [eax]
@End:
end;
Fred
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.