Author: Frederik Tack
Date: 04:47:03 12/23/05
Go up one level in this thread
On December 22, 2005 at 16:29:40, Gerd Isenberg wrote: >>Is this faster than Bob's bit routine in Crafty? >> >>Anyone compared? > >I guess not - it might be a nice portable backup. > >Iirc Crafty's FirstOne is based on leading zero count, so this bsr replacement >is used very often with 63-bitScanReverse or so. It should be inlined for speed >issues - but codesize will grow. Since the bitScan-intrinsic produces only one >64-bit instruction - bsr reg64,reg64 which are only a 4 bytes instead of 100 or >so for the C-routine. The codesize will explode a bit - if used often - which is >likely in Crafty. > >Otoh the bsr reg64,reg64 is 9 cycles vector path. The let say 24 cycles of the >C-routine may be scheduled with other branchless but more memory transfer >instructions around - to improve ipc of the otherwise very dependent instruction >chain with about only one instruction per cycle. > >If the routine is used rarely as in my case, (i traverse the boards the other >way around) - it might be usefull. > >Out of curiosity - in my 32-bit program the routine with takes the largest >number of 64 cycles, behaves best in my program - negligible and may be noise - >however: > >// 64 cycles >unsigned int bitScanReverse(BitBoard bb) { > union { > double d; > struct { > unsigned int mantissal : 32; > unsigned int mantissah : 20; > unsigned int exponent : 11; > unsigned int sign : 1; > }; > } ud; > ud.d = (double)(bb & ~(bb >> 32)); > return ud.exponent - 1023; >} > >the worst one: > >// 30 cycles - implies parameter via memory even if inlined >unsigned int bitScanReverse(BitBoard bb) { > __asm { > bsr eax,[bb] > bsr eax,[bb+4] > setnz dl > shl dl, 5 > add al, dl > } >} > >Gerd About the assembler routine being slower : I found that using a call by reference instead of a call by value speeds up the function in Delphi a great deal because you get the adress of the bitboard directly in the EAX register, don't know about C though but perhaps it will be the same or perhaps its just a thing of the Delphi compiler. I also try to optimize performance by doing the first bitscan on the side of the board where the bit is most likely to occur. For example if i have to find the white king, the piece is most likely to be found on the white side of the board. That way BSF/BSR is executed only once in most cases. Here's the code function BitScanForwardBlack(var Bitboard: TBitboard): TSquare; // Returns the first occupied square of a bitboard, // scanning forward and starting with the black side of the board asm bsf eax, [eax] jnz @End bsf eax, [eax+4] jz @End add eax, 32 @End: end; function BitScanReverseWhite(var Bitboard: TBitboard): TSquare; // Returns the first occupied square of a bitboard, // scanning backward and starting with the white side of the board asm bsr eax, [eax+4] jz @Black add eax, 32 jmp @End @Black: bsr eax, [eax] @End: end; Fred
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.