Author: Gerd Isenberg
Date: 05:59:38 12/02/02
Go up one level in this thread
On December 02, 2002 at 07:19:06, Sune Fischer wrote:
>On December 01, 2002 at 17:05:06, Gerd Isenberg wrote:
>
>>oups, something shorter and faster:
>>
>>int getBitIndex(BitBoard singleBit)
>>{
>> __asm
>> {
>> pxor mm2, mm2 ; 0
>> movd mm0, [singleBit]
>> punpckldq mm0, [singleBit+4]
>> pcmpeqd mm6, mm6 ; -1
>> pxor mm7, mm7 ; 0
>> pcmpeqd mm2, mm0 ; ~mask of the none zero dword
>> PI2FD mm1, mm0 ; 3f8..,400..
>> pxor mm2, mm6 ; mask of the none zero dword
>> psrlq mm6, 63 ; 01
>> psrld mm1, 23 ; 3f8 to 7f
>> psrld mm2, 25 ; 7f mask
>> psllq mm6, 32+5 ; 20:00
>> psubd mm1, mm2 ; - 7f mask
>> por mm1, mm6 ; + 32 in high dword
>> pand mm1, mm2 ; & 7f mask
>> psadbw mm1, mm7 ; add all bytes
>> movd eax, mm1
>> }
>>}
>
>This is great, I will try it.
>
>What I really need is GetFirstBitAndReset() functions.
<snip>
>Is it possible to make it xor out the bit it found too?
>Perhaps it is too complicated, in my case I think b&(-b) needs to be in
>assembler, so that the precondition is removed entirely.
Hi Sune,
hmm, lets try it on the fly (i'm at work, so not tested and optimized so far):
int bitSearchAndReset(BitBoard &bb)
{
BitBoard lsb = bb & -((__int64)bb);
bb ^= lsb;
return getBitIndex(lsb); // should be inlined
}
With mmx there is some trouble with the 64-bit twos-complement, because there is
no paddq:
int bitSearchAndReset(BitBoard &bb)
{
__asm
{
pxor mm2, mm2 ; 0
pxor mm3, mm3 ; 0
pcmpeqd mm6, mm6 ; -1
pcmpeqd mm1, mm1 ; -1
mov eax, [bb]
movq mm0, [eax] ; assume properly aligned bitboard
psrlq mm6, 63 ; 00:01
pxor mm1, mm0 ; ~bb, ones complement
paddd mm1, mm6 ; +1 but no overflow to high dword
psllq mm6, 32 ; 01:00
pcmpeqd mm3, mm1 ; look whether low dword is zero due to overflow
psllq mm3, 1 ; shift carry to the right place
pand mm3, mm6 ; ... and mask 1
paddd mm1, mm3 ; add possible overflow, no we have -bb
pand mm0, mm1 ; lsb = bb & -bb
pxor mm7, mm7 ; 0
pxor [eax], mm0 ; reset lsb in bb
pcmpeqd mm2, mm0 ; ~mask of the none zero dword
PI2FD mm1, mm0 ; 3f8..,400..
pxor mm2, mm6 ; mask of the none zero dword
psrld mm1, 23 ; 3f8 to 7f
psrld mm2, 25 ; 7f mask
psllq mm6, 5 ; 20:00
psubd mm1, mm2 ; - 7f mask
por mm1, mm6 ; + 32 in high dword
pand mm1, mm2 ; & 7f mask
psadbw mm1, mm7 ; add all bytes
movd eax, mm1
}
}
>Is it possible to do a similar optimization on 32 bit?
may be...
>
>I have this: oups, there is a serious error!
>uint32 FirstBit32(uint32 bitmap)
>{
> __asm
> {
> bsf eax, [bitmap]
> jnz done
> mov eax, 0 // That is even true if bit 0 is set !!!
> done:
> }
>}
should be:
uint32 FirstBit32(uint32 bitmap)
{
__asm
{
bsf eax, [bitmap]
jnz done
mov eax, 0xffffffff
done:
}
}
or
int FirstBit32(uint32 bitmap)
{
__asm
{
bsf eax, [bitmap]
jnz done
mov eax, -1
done:
}
}
>
>I would like functions that precondition the bitboard is not empty, ie. that at
>least 1 bit is set. The little function above isn't optimized for that, how do I change it?
>
>Thanks :)
>-S.
or this one for Athlon with PI2FD with reset of the found bit:
// should return < 0 (0x80000000) if bitmap is zero
// not tested !!
int FirstBit32WithReset(unsigned int &bitmap)
{
__asm
{
mov ebx, [bitmap]
xor edx, edx ; 0
mov eax, [ebx] ; b
sub edx, eax ; -b
and eax, edx ; b & -b
xor [ebx], eax ; reset bit, if any
movd mm0, eax ; hmm... vector path
PI2FD mm0, mm0 ; 0-0; 1-3f8.., 2-400.., 4-408
movd eax, mm0
shr eax, 23 ; 0, 7f, 80, 81...
sub eax, 0x7f
and eax, 0x8000001f
}
}
But there are a lot of register dependencies...
Gerd
This page took 0.03 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.