Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: just another reverse bitscan

Author: Frederik Tack

Date: 04:47:03 12/23/05

Go up one level in this thread


On December 22, 2005 at 16:29:40, Gerd Isenberg wrote:

>>Is this faster than Bob's bit routine in Crafty?
>>
>>Anyone compared?
>
>I guess not - it might be a nice portable backup.
>
>Iirc Crafty's FirstOne is based on leading zero count, so this bsr replacement
>is used very often with 63-bitScanReverse or so. It should be inlined for speed
>issues - but codesize will grow. Since the bitScan-intrinsic produces only one
>64-bit instruction - bsr reg64,reg64 which are only a 4 bytes instead of 100 or
>so for the C-routine. The codesize will explode a bit - if used often - which is
>likely in Crafty.
>
>Otoh the bsr reg64,reg64 is 9 cycles vector path. The let say 24 cycles of the
>C-routine may be scheduled with other branchless but more memory transfer
>instructions around - to improve ipc of the otherwise very dependent instruction
>chain with about only one instruction per cycle.
>
>If the routine is used rarely as in my case, (i traverse the boards the other
>way around) - it might be usefull.
>
>Out of curiosity - in my 32-bit program the routine with takes the largest
>number of 64 cycles, behaves best in my program - negligible and may be noise -
>however:
>
>// 64 cycles
>unsigned int bitScanReverse(BitBoard bb) {
>   union {
>      double d;
>      struct {
>         unsigned int mantissal : 32;
>         unsigned int mantissah : 20;
>         unsigned int exponent : 11;
>         unsigned int sign : 1;
>      };
>   } ud;
>   ud.d = (double)(bb & ~(bb >> 32));
>   return ud.exponent - 1023;
>}
>
>the worst one:
>
>// 30 cycles - implies parameter via memory even if inlined
>unsigned int bitScanReverse(BitBoard bb) {
>  __asm  {
>    bsr eax,[bb]
>    bsr eax,[bb+4]
>    setnz dl
>    shl	dl, 5
>    add al, dl
>  }
>}
>
>Gerd

About the assembler routine being slower :
I found that using a call by reference instead of a call by value speeds up the
function in Delphi a great deal because you get the adress of the bitboard
directly in the EAX register, don't know about C though but perhaps it will be
the same or perhaps its just a thing of the Delphi compiler.
I also try to optimize performance by doing the first bitscan on the side of the
board where the bit is most likely to occur. For example if i have to find the
white king, the piece is most likely to be found on the white side of the board.
That way BSF/BSR is executed only once in most cases.

Here's the code

function BitScanForwardBlack(var Bitboard: TBitboard): TSquare;
// Returns the first occupied square of a bitboard,
// scanning forward and starting with the black side of the board
asm
  bsf eax, [eax]
  jnz @End
  bsf eax, [eax+4]
  jz @End
  add eax, 32
  @End:
end;


function BitScanReverseWhite(var Bitboard: TBitboard): TSquare;
// Returns the first occupied square of a bitboard,
// scanning backward and starting with the white side of the board
asm
  bsr eax, [eax+4]
  jz @Black
  add eax, 32
  jmp @End
  @Black:
  bsr eax, [eax]
  @End:
end;


Fred

















This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.