Author: Gerd Isenberg
Date: 12:56:58 12/01/02
Hi all,
The fastest 64bit - BitScan with Athlons so far!
The AMD-3DNow! Instruction PI2FD does a packed 32-bit (signed) integer to
floating-point (IEEE 32-bit) conversion. The PI2FD (direct path, 4 cycles
latency) instruction performs the following operations:
mmreg1 [31: 0] == float(mmreg2/mem64 [31: 0])
mmreg1 [63:32] == float(mmreg2/mem64 [63:32])
Bit 31 of the float is the sign bit, the eight bits 30:23 are the biased
exponent and the remaining 23 bits are the significand.
If you pass single-bit bitboards through this instruction, the bitscan idea
becomes obvious clear due to the biased base two exponent.
bit 0-63 PI2FD
00000001 3F800000
00000002 40000000
00000004 40800000
00000008 41000000
00000010 41800000
..
40000000 4E800000
80000000 CF000000 ! sign
Some additional shift, substraction and masks, and there is faster bitscan
routine than with two 32-bit bsf-instructions!
// Athlon only due to pswapd and PI2FD
// precondition: BBs has exactly one bit set
// may be used as bitscan with (b&-b)
//===========================================
int getBitIndex(BitBoard singleBit)
{
__asm
{
pxor mm2, mm2 ; 0
movd mm0, [singleBit]
punpckldq mm0, [singleBit+4]
pcmpeqd mm4, mm4 ; -1
pcmpeqd mm3, mm3 ; -1
pcmpeqd mm2, mm0 ; -1-mask for the zero dword
PI2FD mm1, mm0 ; 3f8..,400..
psrlq mm4, 63 ; 00:01
pxor mm2, mm3 ; -1-mask for the none zero dword
psrld mm3, 25 ; 7f:7f
psrld mm1, 23 ; 3f8.. to 7f
psllq mm4, 32+5 ; 20:00 to add 32 to high dword
psubd mm1, mm3 ; - 7f:7f
pand mm1, mm3 ; & 7f:7f
por mm1, mm4 ; + 32 to high dword
pand mm1, mm2 ; mask apropriate low- or high dword
pswapd mm2, mm1 ; swap dwords
por mm1, mm2 ; combine low and high dword
movd eax, mm1
}
}
Even if there is some effort to calculate the consts 7f:7f and 20:0, this is
shlightly faster than the bsf pair, even with (b&-b) outside the functions!
A parallized verions with two bitboards (forBB, toBB) does very well, only 12.5
seconds for 10^9 runs, faster than a single bsf pair.
Cheers,
Gerd
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.