Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Fast 3DNow! BitScan, one more faster

Author: Sune Fischer

Date: 08:53:37 12/02/02

On December 02, 2002 at 08:59:38, Gerd Isenberg wrote:

>On December 02, 2002 at 07:19:06, Sune Fischer wrote:
>
>>On December 01, 2002 at 17:05:06, Gerd Isenberg wrote:
>>
>>>oups, something shorter and faster:
>>>
>>>int getBitIndex(BitBoard singleBit)
>>>{
>>>	__asm
>>>	{
>>>		pxor	mm2, mm2	; 0
>>>		movd		mm0, [singleBit]
>>>		punpckldq	mm0, [singleBit+4]
>>>		pcmpeqd	mm6, mm6	; -1
>>>		pxor	mm7, mm7	; 0
>>>		pcmpeqd	mm2, mm0	; ~mask of the none zero dword
>>>		PI2FD	mm1, mm0	; 3f8..,400..
>>>		pxor	mm2, mm6	; mask of the none zero dword
>>>		psrlq	mm6, 63		; 01
>>>		psrld	mm1, 23		; 3f8 to 7f
>>>		psrld	mm2, 25		; 7f mask
>>>		psllq	mm6, 32+5	; 20:00
>>>		psubd	mm1, mm2	; - 7f mask
>>>		por	mm1, mm6	; + 32 in high dword
>>>		pand	mm1, mm2	; & 7f mask
>>>		psadbw	mm1, mm7	; add all bytes
>>>		movd	eax, mm1
>>>	}
>>>}
>>
>>This is great, I will try it.
>>
>>What I really need is GetFirstBitAndReset() functions.
><snip>
>
>>Is it possible to make it xor out the bit it found too?
>>Perhaps it is too complicated, in my case I think b&(-b) needs to be in
>>assembler, so that the precondition is removed entirely.
>
>
>Hi Sune,
>
>hmm, lets try it on the fly (i'm at work, so not tested and optimized so far):
>
>int bitSearchAndReset(BitBoard &bb)
>{
>    BitBoard lsb = bb & -((__int64)bb);
>    bb ^= lsb;
>    return getBitIndex(lsb); // should be inlined
>}
>
>With mmx there is some trouble with the 64-bit twos-complement, because there is
>no paddq:
>
>int bitSearchAndReset(BitBoard &bb)
>{
>	__asm
>	{
>		pxor	mm2, mm2	; 0
>		pxor	mm3, mm3	; 0
>		pcmpeqd	mm6, mm6	; -1
>		pcmpeqd	mm1, mm1	; -1
>
>                mov     eax, [bb]
>		movq	mm0, [eax]      ; assume properly aligned bitboard
>		psrlq	mm6, 63		; 00:01
>                pxor    mm1, mm0	; ~bb, ones complement
>                paddd   mm1, mm6        ; +1 but no overflow to high dword
>		psllq	mm6, 32         ; 01:00
>		pcmpeqd	mm3, mm1        ; look whether low dword is zero due to overflow
>                psllq   mm3, 1          ; shift carry to the right place
>                pand    mm3, mm6        ;  ... and mask 1
>                paddd   mm1, mm3        ; add possible overflow, no we have -bb
>                pand    mm0, mm1	; lsb = bb & -bb
>		pxor	mm7, mm7	; 0
>		pxor	[eax], mm0      ; reset lsb in bb
>
>		pcmpeqd	mm2, mm0	; ~mask of the none zero dword
>		PI2FD	mm1, mm0	; 3f8..,400..
>		pxor	mm2, mm6	; mask of the none zero dword
>		psrld	mm1, 23		; 3f8 to 7f
>		psrld	mm2, 25		; 7f mask
>		psllq	mm6, 5		; 20:00
>		psubd	mm1, mm2	; - 7f mask
>		por	mm1, mm6	; + 32 in high dword
>		pand	mm1, mm2	; & 7f mask
>		psadbw	mm1, mm7	; add all bytes
>		movd	eax, mm1
>	}
>}
>
>>Is it possible to do a similar optimization on 32 bit?
>
>may be...
>
>>
>>I have this:  oups, there is a serious error!
>>uint32  FirstBit32(uint32 bitmap)
>>{
>>	__asm
>>	{
>>		bsf	eax, [bitmap]
>>		jnz	done
>>		mov	eax, 0 // That is even true if bit 0 is set !!!
>>	done:
>>	}
>>}
>
>should be:
>
>uint32  FirstBit32(uint32 bitmap)
>{
>	__asm
>	{
>		bsf	eax, [bitmap]
>		jnz	done
>		mov	eax, 0xffffffff
>	done:
>	}
>}

It would crash if I tried continuing with that value, so it has to be a value
in the range 0-31.
I know it is no good for error detection then, but I don't use that anyway,
hence my question for the precoditioning :)

>or
>
>int FirstBit32(uint32 bitmap)
>{
>	__asm
>	{
>		bsf	eax, [bitmap]
>		jnz	done
>		mov	eax, -1
>	done:
>	}
>}

signed/unsigned mix is just slowing things down. I always use unsigned when
possible, it gives a measurable boost! :)

>>
>>I would like functions that precondition the bitboard is not empty, ie. that at
>>least 1 bit is set. The little function above isn't optimized for that, how do I change it?

so would this work?
uint32 FirstBit32(uint32 bitmap)
{
	__asm
	{
		bsf	eax, [bitmap]
	}
}

?
I do not want redundant assembler lines, I already know the bitmaps (32 or 64
bits) aren't 0 because they are running inside while loops, so I hope there is
no testing of that?

>>Thanks :)
>>-S.
>
>or this one for Athlon with PI2FD with reset of the found bit:
>
>// should return < 0 (0x80000000) if bitmap is zero
>// not tested !!
>
>int FirstBit32WithReset(unsigned int &bitmap)
>{
>	__asm
>	{
>		mov	ebx, [bitmap]
>		xor	edx, edx        ; 0
>		mov	eax, [ebx]      ; b
>		sub	edx, eax        ; -b
>		and	eax, edx        ; b & -b
>                xor     [ebx], eax      ; reset bit, if any
>		movd    mm0, eax        ; hmm... vector path
>		PI2FD	mm0, mm0	; 0-0; 1-3f8.., 2-400.., 4-408
>		movd	eax, mm0
>		shr     eax, 23         ; 0, 7f, 80, 81...
>		sub     eax, 0x7f
>		and     eax, 0x8000001f
>	}
>}
>
>But there are a lot of register dependencies...

Hmm, may be, I just thought it would be "nice" to not have that x&=x-1
everywhere. It's more compact an a little bit less errorprone, sometimes I
forget and have to debug an infinite loop stall.

I will try them, thanks :)

-S.

>Gerd

BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 13:48:52 12/02/02
- Re: BitScan with reset - not so impressive with 3DNow! Walter Faxon 15:49:10 12/02/02
  - Re: BitScan with reset - not so impressive with 3DNow! Russell Reagan 12:50:15 12/03/02
    - Re: BitScan with reset - not so impressive with 3DNow! Russell Reagan 20:30:50 12/03/02
      - Re: BitScan with reset - not so impressive with 3DNow! Sune Fischer 05:20:35 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Russell Reagan 05:51:14 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Sune Fischer 05:56:46 12/05/02
      - Re: BitScan with reset - not so impressive with 3DNow! Sune Fischer 01:39:05 12/04/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Russell Reagan 14:13:02 12/04/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Russell Reagan 20:27:46 12/04/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Sune Fischer 01:15:35 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Russell Reagan 06:03:09 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Sune Fischer 07:12:55 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Matt Taylor 04:34:37 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Sune Fischer 05:10:42 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Dan Newman 05:27:54 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 08:25:27 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Dieter Buerssner 09:48:02 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 12:41:02 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Dieter Buerssner 13:06:06 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Matt Taylor 12:44:53 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Matt Taylor 12:39:40 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Dieter Buerssner 13:07:53 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Miguel A. Ballicora 07:40:18 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Dieter Buerssner 09:21:18 12/05/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Miguel A. Ballicora 10:07:53 12/06/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Dieter Buerssner 15:06:30 12/06/02
    - Re: BitScan with reset - not so impressive with 3DNow! Jeremiah Penery 13:14:57 12/03/02
      - Re: BitScan with reset - not so impressive with 3DNow! Gian-Carlo Pascutto 01:39:10 12/04/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Jeremiah Penery 08:03:09 12/04/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Dezhi Zhao 13:54:05 12/04/02
      - Re: BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 15:07:26 12/03/02
  - Re: BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 17:27:39 12/02/02
    - Possible small improvement to hacky method Walter Faxon 21:49:11 12/05/02
      - Re: Possible small improvement to hacky method Matt Taylor 12:57:13 12/10/02
        
        Re: Possible small improvement to hacky method Gerd Isenberg 16:14:18 12/10/02
        
        Re: Possible small improvement to hacky method Matt Taylor 18:27:15 12/10/02
      - Re: Possible small improvement to hacky method Frank Phillips 03:33:00 12/07/02
        
        Re: Possible small improvement to hacky method Walter Faxon 11:03:04 12/07/02
        
        Re: Possible small improvement to hacky method Frank Phillips 12:20:16 12/07/02
      - Re: Possible small improvement to hacky method Matt Taylor 02:33:42 12/06/02
        
        Re: Possible small improvement to hacky method Walter Faxon 19:50:40 12/06/02
        
        Re: Possible small improvement to hacky method Matt Taylor 11:14:32 12/07/02
        
        Re: Possible small improvement to hacky method Walter Faxon 20:40:38 12/07/02
        
        Re: Possible small improvement to hacky method Matt Taylor 02:39:16 12/08/02
        
        Re: Possible small improvement to hacky method Walter Faxon 22:59:51 12/08/02
        
        Re: Possible small improvement to hacky method Matt Taylor 00:07:49 12/09/02
  - Re: BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 16:43:15 12/02/02
    - Re: BitScan with reset - not so impressive with 3DNow! Matt Taylor 06:46:27 12/03/02
    - Re: BitScan with reset - not so impressive with 3DNow! Gian-Carlo Pascutto 03:40:36 12/03/02
      - Re: BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 07:09:31 12/03/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Russell Reagan 12:15:27 12/03/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Matt Taylor 14:27:07 12/03/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Miguel A. Ballicora 10:35:54 12/03/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 14:00:45 12/03/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Miguel A. Ballicora 09:00:03 12/04/02
        
        Oups, null as terminator is not so smart! Gerd Isenberg 15:19:22 12/03/02
      - Re: BitScan with reset - not so impressive with 3DNow! Russell Reagan 06:32:16 12/03/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Gian-Carlo Pascutto 08:47:54 12/03/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Gerd Isenberg 07:50:32 12/03/02
      - Re: BitScan with reset - not so impressive with 3DNow! Matt Taylor 06:05:30 12/03/02
      - Re: BitScan with reset - not so impressive with 3DNow! Sune Fischer 04:00:37 12/03/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Matt Taylor 07:29:50 12/04/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Sune Fischer 07:39:34 12/04/02
        
        Re: BitScan with reset - not so impressive with 3DNow! Matt Taylor 09:03:07 12/04/02
    - Re: BitScan with reset - not so impressive with 3DNow! Walter Faxon 18:01:05 12/02/02
      - Re: BitScan with reset - not so impressive with 3DNow! Alessandro Damiani 02:05:24 12/03/02

This page took 0.08 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.