Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Fast 3DNow! BitScan, one more faster

Author: Gerd Isenberg

Date: 05:59:38 12/02/02

Go up one level in this thread


On December 02, 2002 at 07:19:06, Sune Fischer wrote:

>On December 01, 2002 at 17:05:06, Gerd Isenberg wrote:
>
>>oups, something shorter and faster:
>>
>>int getBitIndex(BitBoard singleBit)
>>{
>>	__asm
>>	{
>>		pxor	mm2, mm2	; 0
>>		movd		mm0, [singleBit]
>>		punpckldq	mm0, [singleBit+4]
>>		pcmpeqd	mm6, mm6	; -1
>>		pxor	mm7, mm7	; 0
>>		pcmpeqd	mm2, mm0	; ~mask of the none zero dword
>>		PI2FD	mm1, mm0	; 3f8..,400..
>>		pxor	mm2, mm6	; mask of the none zero dword
>>		psrlq	mm6, 63		; 01
>>		psrld	mm1, 23		; 3f8 to 7f
>>		psrld	mm2, 25		; 7f mask
>>		psllq	mm6, 32+5	; 20:00
>>		psubd	mm1, mm2	; - 7f mask
>>		por	mm1, mm6	; + 32 in high dword
>>		pand	mm1, mm2	; & 7f mask
>>		psadbw	mm1, mm7	; add all bytes
>>		movd	eax, mm1
>>	}
>>}
>
>This is great, I will try it.
>
>What I really need is GetFirstBitAndReset() functions.
<snip>

>Is it possible to make it xor out the bit it found too?
>Perhaps it is too complicated, in my case I think b&(-b) needs to be in
>assembler, so that the precondition is removed entirely.


Hi Sune,

hmm, lets try it on the fly (i'm at work, so not tested and optimized so far):

int bitSearchAndReset(BitBoard &bb)
{
    BitBoard lsb = bb & -((__int64)bb);
    bb ^= lsb;
    return getBitIndex(lsb); // should be inlined
}

With mmx there is some trouble with the 64-bit twos-complement, because there is
no paddq:

int bitSearchAndReset(BitBoard &bb)
{
	__asm
	{
		pxor	mm2, mm2	; 0
		pxor	mm3, mm3	; 0
		pcmpeqd	mm6, mm6	; -1
		pcmpeqd	mm1, mm1	; -1

                mov     eax, [bb]
		movq	mm0, [eax]      ; assume properly aligned bitboard
		psrlq	mm6, 63		; 00:01
                pxor    mm1, mm0	; ~bb, ones complement
                paddd   mm1, mm6        ; +1 but no overflow to high dword
		psllq	mm6, 32         ; 01:00
		pcmpeqd	mm3, mm1        ; look whether low dword is zero due to overflow
                psllq   mm3, 1          ; shift carry to the right place
                pand    mm3, mm6        ;  ... and mask 1
                paddd   mm1, mm3        ; add possible overflow, no we have -bb
                pand    mm0, mm1	; lsb = bb & -bb
		pxor	mm7, mm7	; 0
		pxor	[eax], mm0      ; reset lsb in bb

		pcmpeqd	mm2, mm0	; ~mask of the none zero dword
		PI2FD	mm1, mm0	; 3f8..,400..
		pxor	mm2, mm6	; mask of the none zero dword
		psrld	mm1, 23		; 3f8 to 7f
		psrld	mm2, 25		; 7f mask
		psllq	mm6, 5		; 20:00
		psubd	mm1, mm2	; - 7f mask
		por	mm1, mm6	; + 32 in high dword
		pand	mm1, mm2	; & 7f mask
		psadbw	mm1, mm7	; add all bytes
		movd	eax, mm1
	}
}

>Is it possible to do a similar optimization on 32 bit?

may be...

>
>I have this:  oups, there is a serious error!
>uint32  FirstBit32(uint32 bitmap)
>{
>	__asm
>	{
>		bsf	eax, [bitmap]
>		jnz	done
>		mov	eax, 0 // That is even true if bit 0 is set !!!
>	done:
>	}
>}

should be:

uint32  FirstBit32(uint32 bitmap)
{
	__asm
	{
		bsf	eax, [bitmap]
		jnz	done
		mov	eax, 0xffffffff
	done:
	}
}

or

int FirstBit32(uint32 bitmap)
{
	__asm
	{
		bsf	eax, [bitmap]
		jnz	done
		mov	eax, -1
	done:
	}
}


>
>I would like functions that precondition the bitboard is not empty, ie. that at
>least 1 bit is set. The little function above isn't optimized for that, how do I change it?
>
>Thanks :)
>-S.

or this one for Athlon with PI2FD with reset of the found bit:

// should return < 0 (0x80000000) if bitmap is zero
// not tested !!

int FirstBit32WithReset(unsigned int &bitmap)
{
	__asm
	{
		mov	ebx, [bitmap]
		xor	edx, edx        ; 0
		mov	eax, [ebx]      ; b
		sub	edx, eax        ; -b
		and	eax, edx        ; b & -b
                xor     [ebx], eax      ; reset bit, if any
		movd    mm0, eax        ; hmm... vector path
		PI2FD	mm0, mm0	; 0-0; 1-3f8.., 2-400.., 4-408
		movd	eax, mm0
		shr     eax, 23         ; 0, 7f, 80, 81...
		sub     eax, 0x7f
		and     eax, 0x8000001f
	}
}

But there are a lot of register dependencies...

Gerd




This page took 0.03 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.