Author: Sune Fischer
Date: 08:53:37 12/02/02
Go up one level in this thread
On December 02, 2002 at 08:59:38, Gerd Isenberg wrote:
>On December 02, 2002 at 07:19:06, Sune Fischer wrote:
>
>>On December 01, 2002 at 17:05:06, Gerd Isenberg wrote:
>>
>>>oups, something shorter and faster:
>>>
>>>int getBitIndex(BitBoard singleBit)
>>>{
>>> __asm
>>> {
>>> pxor mm2, mm2 ; 0
>>> movd mm0, [singleBit]
>>> punpckldq mm0, [singleBit+4]
>>> pcmpeqd mm6, mm6 ; -1
>>> pxor mm7, mm7 ; 0
>>> pcmpeqd mm2, mm0 ; ~mask of the none zero dword
>>> PI2FD mm1, mm0 ; 3f8..,400..
>>> pxor mm2, mm6 ; mask of the none zero dword
>>> psrlq mm6, 63 ; 01
>>> psrld mm1, 23 ; 3f8 to 7f
>>> psrld mm2, 25 ; 7f mask
>>> psllq mm6, 32+5 ; 20:00
>>> psubd mm1, mm2 ; - 7f mask
>>> por mm1, mm6 ; + 32 in high dword
>>> pand mm1, mm2 ; & 7f mask
>>> psadbw mm1, mm7 ; add all bytes
>>> movd eax, mm1
>>> }
>>>}
>>
>>This is great, I will try it.
>>
>>What I really need is GetFirstBitAndReset() functions.
><snip>
>
>>Is it possible to make it xor out the bit it found too?
>>Perhaps it is too complicated, in my case I think b&(-b) needs to be in
>>assembler, so that the precondition is removed entirely.
>
>
>Hi Sune,
>
>hmm, lets try it on the fly (i'm at work, so not tested and optimized so far):
>
>int bitSearchAndReset(BitBoard &bb)
>{
> BitBoard lsb = bb & -((__int64)bb);
> bb ^= lsb;
> return getBitIndex(lsb); // should be inlined
>}
>
>With mmx there is some trouble with the 64-bit twos-complement, because there is
>no paddq:
>
>int bitSearchAndReset(BitBoard &bb)
>{
> __asm
> {
> pxor mm2, mm2 ; 0
> pxor mm3, mm3 ; 0
> pcmpeqd mm6, mm6 ; -1
> pcmpeqd mm1, mm1 ; -1
>
> mov eax, [bb]
> movq mm0, [eax] ; assume properly aligned bitboard
> psrlq mm6, 63 ; 00:01
> pxor mm1, mm0 ; ~bb, ones complement
> paddd mm1, mm6 ; +1 but no overflow to high dword
> psllq mm6, 32 ; 01:00
> pcmpeqd mm3, mm1 ; look whether low dword is zero due to overflow
> psllq mm3, 1 ; shift carry to the right place
> pand mm3, mm6 ; ... and mask 1
> paddd mm1, mm3 ; add possible overflow, no we have -bb
> pand mm0, mm1 ; lsb = bb & -bb
> pxor mm7, mm7 ; 0
> pxor [eax], mm0 ; reset lsb in bb
>
> pcmpeqd mm2, mm0 ; ~mask of the none zero dword
> PI2FD mm1, mm0 ; 3f8..,400..
> pxor mm2, mm6 ; mask of the none zero dword
> psrld mm1, 23 ; 3f8 to 7f
> psrld mm2, 25 ; 7f mask
> psllq mm6, 5 ; 20:00
> psubd mm1, mm2 ; - 7f mask
> por mm1, mm6 ; + 32 in high dword
> pand mm1, mm2 ; & 7f mask
> psadbw mm1, mm7 ; add all bytes
> movd eax, mm1
> }
>}
>
>>Is it possible to do a similar optimization on 32 bit?
>
>may be...
>
>>
>>I have this: oups, there is a serious error!
>>uint32 FirstBit32(uint32 bitmap)
>>{
>> __asm
>> {
>> bsf eax, [bitmap]
>> jnz done
>> mov eax, 0 // That is even true if bit 0 is set !!!
>> done:
>> }
>>}
>
>should be:
>
>uint32 FirstBit32(uint32 bitmap)
>{
> __asm
> {
> bsf eax, [bitmap]
> jnz done
> mov eax, 0xffffffff
> done:
> }
>}
It would crash if I tried continuing with that value, so it has to be a value
in the range 0-31.
I know it is no good for error detection then, but I don't use that anyway,
hence my question for the precoditioning :)
>or
>
>int FirstBit32(uint32 bitmap)
>{
> __asm
> {
> bsf eax, [bitmap]
> jnz done
> mov eax, -1
> done:
> }
>}
signed/unsigned mix is just slowing things down. I always use unsigned when
possible, it gives a measurable boost! :)
>>
>>I would like functions that precondition the bitboard is not empty, ie. that at
>>least 1 bit is set. The little function above isn't optimized for that, how do I change it?
so would this work?
uint32 FirstBit32(uint32 bitmap)
{
__asm
{
bsf eax, [bitmap]
}
}
?
I do not want redundant assembler lines, I already know the bitmaps (32 or 64
bits) aren't 0 because they are running inside while loops, so I hope there is
no testing of that?
>>Thanks :)
>>-S.
>
>or this one for Athlon with PI2FD with reset of the found bit:
>
>// should return < 0 (0x80000000) if bitmap is zero
>// not tested !!
>
>int FirstBit32WithReset(unsigned int &bitmap)
>{
> __asm
> {
> mov ebx, [bitmap]
> xor edx, edx ; 0
> mov eax, [ebx] ; b
> sub edx, eax ; -b
> and eax, edx ; b & -b
> xor [ebx], eax ; reset bit, if any
> movd mm0, eax ; hmm... vector path
> PI2FD mm0, mm0 ; 0-0; 1-3f8.., 2-400.., 4-408
> movd eax, mm0
> shr eax, 23 ; 0, 7f, 80, 81...
> sub eax, 0x7f
> and eax, 0x8000001f
> }
>}
>
>But there are a lot of register dependencies...
Hmm, may be, I just thought it would be "nice" to not have that x&=x-1
everywhere. It's more compact an a little bit less errorprone, sometimes I
forget and have to debug an infinite loop stall.
I will try them, thanks :)
-S.
>Gerd
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.