Author: Gerd Isenberg
Date: 02:08:28 12/02/02
Go up one level in this thread
On December 01, 2002 at 21:04:54, Arshad F. Syed wrote: >I am not much of an expert on this, since I am just starting to get into >assembly programming. But I was curious about the dword used here, which implies >to me you are doing 32-bit stuff. Wouldn't it make life simpler to just get an >Itanium processor and switch to 64-bit code, which would cut down the cycles >used. > >Regards, >Arshad Hi Arshad, Yes, the PI2FD-Instruction does two parallel 32bit integer to 32bit float conversions. This 3DNow! instruction will also be available with amd's hammer. Hammer has also the 64 bit CVTSI2SD or CVTSI2SS Instruction available (already member of P4's SSE), which may be a good alternative if bsf/bsr is also so relatively slow on hammer as on athlons (vector path instruction, blocking all other pipes, huge execution latency), even if there is only one bsf reg64,reg64 necessary. CVTSI2SD xmm, reg/mem64 F2 0F 2A /r converts a quadword integer in a general-purpose register or 64-bit memory location to a double-precision floating-point value in the destination XMM register. CVTSI2SS xmm, reg/mem64 F3 0F 2A /r converts a quadword integer in a general-purpose register or 64-bit memory location to a single-precision floating-point value in the destination XMM register. int getBitIndexOnHammer(BitBoard singleBit) { __asm { mov rax, [singleBit] CVTSI2SS xmm0, rax movq? rax, xmm0 shr rax, 23 sub rax, 0x7f and rax, 0x3f } } I find it amazing, that at least for doing two parallel bitscans, the popCount approach or even more a fast integer/float conversion (2.5 as fast, if you already have done (b&-b) ) outperforms clearly the bsf-pair for 64 bits on Athlon. Gerd > > >On December 01, 2002 at 17:05:06, Gerd Isenberg wrote: > >>oups, something shorter and faster: >> >>int getBitIndex(BitBoard singleBit) >>{ >> __asm >> { >> pxor mm2, mm2 ; 0 >> movd mm0, [singleBit] >> punpckldq mm0, [singleBit+4] >> pcmpeqd mm6, mm6 ; -1 >> pxor mm7, mm7 ; 0 >> pcmpeqd mm2, mm0 ; ~mask of the none zero dword >> PI2FD mm1, mm0 ; 3f8..,400.. >> pxor mm2, mm6 ; mask of the none zero dword >> psrlq mm6, 63 ; 01 >> psrld mm1, 23 ; 3f8 to 7f >> psrld mm2, 25 ; 7f mask >> psllq mm6, 32+5 ; 20:00 >> psubd mm1, mm2 ; - 7f mask >> por mm1, mm6 ; + 32 in high dword >> pand mm1, mm2 ; & 7f mask >> psadbw mm1, mm7 ; add all bytes >> movd eax, mm1 >> } >>}
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.