Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: BitScan with reset - not so impressive with 3DNow!

Author: Miguel A. Ballicora

Date: 10:35:54 12/03/02

Go up one level in this thread


On December 03, 2002 at 10:09:31, Gerd Isenberg wrote:

>On December 03, 2002 at 06:40:36, Gian-Carlo Pascutto wrote:
>
>>On December 02, 2002 at 19:43:15, Gerd Isenberg wrote:
>>
>>>Congrats, Walter!!
>>>
>>>10-bit pattern	bsf	PI2FD	btr	c      LSB_64
>>>0x0000000011111133	15.3	18.0	19.1	22.8    17.8
>>>0x1010111010101110	19.7	18.5	19.6	23.4    17.8
>>>0x1111113300000000	20.6	18.0	19.1	22.8    17.8
>>
>>Does this mean that this is the fastest currently known
>>FindFirstAndRemove *and* that it doesn't need assembly?
>
>Hi Gian-Carlo,
>
>Yes, it seems so. Walter's "magic" routine has only cheap instructions and one
>lookup with a cache friendly table of 154 Bytes.
>
>Even if the conditional bsf-approach is obviously faster, if we have one-bit's
>in the lower half, i hate this kind of asymmetry.
>
>Let's see how fast Hammer's bsf reg64,reg64 will be.
>If it sucks like Athlon, there are a lot of good alternatives.
>I guess that with 64-bit arithmetic, to do a fast single bit isolation the trick
>with 64int to 32bit float conversion is promising.
>
>What i like on Walter's approach is the kind of folding down the sequence of
>consecutive trailing one's  (SingleBit - 1) from one 64-bit word to one 32 bit
>word without loosing any information.
>
>eg
>0000..011|1111...1111
>          1111...1100
>
>Then this wonderful magic xor, some parallel prefix shifts with add and sub, to
>get a unique value in this 0..153 range. Great respect to Walter, to invent
>these algorthm.
>
>Looking for something similar, with two bitboards (fromBB, toBB of valid moves)
>to get unique value in a range of let say 0 to 64*28-1 or 2047!
>
>
>>
>>Impressive for sure!
>>
>>Any similar tricks for lightning-speed popcounting?
>
>I fear with random bits it's not so easy, but who knows...
>
>>
>>What's the fastest sequence you've got for popcounting so
>>far? (preferably one that doesn't depend too much on new
>>instructions)
>
>I use the Athlon MMX-routine provided by AMD's Optimization Guide based on this
>c-routine:
>
>UINT32 BitCount (BitBoard bb)
>{
>	static const UINT32 m1 = 0x55555555;
>	static const UINT32 m2 = 0x33333333;
>	static const UINT32 m3 = 0x0f0f0f0f;
>	UINT32 l = (UINT32) bb;
>	UINT32 h = (UINT32)(bb>>32);
>	l = l - ((l>>1) & m1);
>	h = h - ((h>>1) & m1);
>	l = (l & m2) + ((l>>2) & m2);
>	h = (h & m2) + ((h>>2) & m2);
>	l = (l & m3) + ((l>>4) & m3);
>	h = (h & m3) + ((h>>4) & m3);
>	l = (l & 0x0ffff) + (l>>16);
>	h = (h & 0x0ffff) + (h>>16);
>	return ((l & 0x0ff) + (l>>8) + (h & 0x0ff) + (h>>8));
>}
>
>For rarely populated bitboards (< 8), i use a count loop with bb &= bb-1.
>In this context i use also boolean inlnes like
>
>BOOL IsBitCountGreaterOne(BitBoard bb);
>
>for statements like this
>
>if ( BitCount(bb) > 1 )
>
>I prefere methods without table-lookups, even if they are a bit faster in this
>kind of dumb loop measurement. But on the other hand these excessive
>computations, keeping most pipes busy, are not so hyperthreading friendly.
>
>Cheers,
>Gerd

I haven't had time to follow this thread closely (I am saving it!) but I have
a more general question. Is it really necessary to have this routine with
bitboards? My program is based on bitboards and I do not need to use anything
like bsr etc. The point is, do we really need to get the first bit and only that
first bit at any point? In my program, when I need the first bit I also need all
the others afterwards, so I just have a function that translate
a bitboard to a list of squares in only one go. In this way, things can be done
more efficiently and I use a system of lookup tables. Not only that, I do not
even need to use 64 bit numbers for the calculation. I can split it in two 32
bits halves and work with them separately. In theory that is better. Last time I
checked this function took like 3% of my total time. Granted, that is not a good
parameter because my program is slow in general.

Anyway, the question is, The point is, do we really need to get the first bit
and only that first bit at any point? At least I don't, but I wonder if it is
because I don't do something that other people do somewhere else in the program.

void
sqlist_frombb (SQUARE *ps, BITBOARD bb)
{
	uint32_t		a, cc;
	unsigned int	r;
	unsigned int	c;

	ASSERT( ps != NULL);

	/*------------------------------------*/

        /* get firs half of the bitboard */
	a = bb.hf[0];

	r = 0; /* loop thru different rows "r" */
	while (a != 0) {

                /* get from lookup table a compacted list of "columns"
                   that correspond to 8 bits maximum ==> 8 "columns"...
                   each square is compacted in 4 bits so I can have 8
                   possible columns in a 32 bit number, stored in the array
                   pcol_compact,
                   a & 0xff gets the first "row" of the bitboard that might
                   contain up to 8 bits ==> "columns"
                   cc is the compacted column information */

		cc = pcol_compact [a & 0xff];

                /* get first (next) column in c, */
		c = cc & 0xf;
		cc >>= 4;

                /* add a terminator NOCOLUMN to the compacted list */
		cc |= NOCOLUMN << 28;

		while (c != NOCOLUMN) {
                        /* build a square from the column and row info */
			*ps++ = r | c;

                        /* get (next) column in c, */
			c = cc & 0xf;
			cc >>= 4;
		}
		a >>= 8;
		r += 8;;
	}

        /* do everything again for the second half of the bitboard
           this is not commented */

        a = bb.hf[1];
	r = 32;
	while (a != 0) {
		cc = pcol_compact [a & 0xff];
		c = cc & 0xf;
		cc >>= 4;
		cc |= NOCOLUMN << 28;
		while (c != NOCOLUMN) {
			*ps++ = r | c;
			c = cc & 0xf;
			cc >>= 4;
		}
		a >>= 8;
		r += 8;;
	}

	*ps = NOSQUARE;
}


Regards,
Miguel


>
>
>>--
>>GCP



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.