Author: Gerd Isenberg
Date: 13:29:40 12/22/05
Go up one level in this thread
>Is this faster than Bob's bit routine in Crafty? > >Anyone compared? I guess not - it might be a nice portable backup. Iirc Crafty's FirstOne is based on leading zero count, so this bsr replacement is used very often with 63-bitScanReverse or so. It should be inlined for speed issues - but codesize will grow. Since the bitScan-intrinsic produces only one 64-bit instruction - bsr reg64,reg64 which are only a 4 bytes instead of 100 or so for the C-routine. The codesize will explode a bit - if used often - which is likely in Crafty. Otoh the bsr reg64,reg64 is 9 cycles vector path. The let say 24 cycles of the C-routine may be scheduled with other branchless but more memory transfer instructions around - to improve ipc of the otherwise very dependent instruction chain with about only one instruction per cycle. If the routine is used rarely as in my case, (i traverse the boards the other way around) - it might be usefull. Out of curiosity - in my 32-bit program the routine with takes the largest number of 64 cycles, behaves best in my program - negligible and may be noise - however: // 64 cycles unsigned int bitScanReverse(BitBoard bb) { union { double d; struct { unsigned int mantissal : 32; unsigned int mantissah : 20; unsigned int exponent : 11; unsigned int sign : 1; }; } ud; ud.d = (double)(bb & ~(bb >> 32)); return ud.exponent - 1023; } the worst one: // 30 cycles - implies parameter via memory even if inlined unsigned int bitScanReverse(BitBoard bb) { __asm { bsr eax,[bb] bsr eax,[bb+4] setnz dl shl dl, 5 add al, dl } } Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.