Author: Gerd Isenberg
Date: 16:43:15 12/02/02
Go up one level in this thread
On December 02, 2002 at 18:49:10, Walter Faxon wrote:
>On December 02, 2002 at 16:48:52, Gerd Isenberg wrote:
>
>>May be i found not the best way to to b&(-b) with mmx, but building the 64-bit
>>two's complement with mmx-dword is not so nice. First you have to do the one's
>>complement by pxor -1, then comparing low dword with -1 and building an
>>conditional overflow, adding 00:01 or 01:01...
>>So using 32-bit registers was the fastest so far, but that may require some
>>additional push/pop.
>>
>>Some times mesured in seconds with this dumb loop (nothing inlined):
>>K7XP2.1+ ~1.8GHz
>>
Congrats, Walter!!
10-bit pattern bsf PI2FD btr c LSB_64
0x0000000011111133 15.3 18.0 19.1 22.8 17.8
0x1010111010101110 19.7 18.5 19.6 23.4 17.8
0x1111113300000000 20.6 18.0 19.1 22.8 17.8
>>
>>inlined are ~5 seconds faster
>Hi, Gerd.
>
>As long as you're test-comparing bit-search-and-reset codes, I wonder if you
>could please consider also comparing my C code version, posted on CCC with the
>subject "Another hacky method for bitboard bit extraction" on November 17. I
>repeat it below. You can of course make those changes required for proper
>compilation and comparison in your setup. Thanks!
>
>-- Walter
>
>Code follows:
>// ---------------------------------------------------------------------------
>
>typedef unsigned long long u64; // nonstandard
>typedef unsigned long u32;
>typedef unsigned char u8;
>
>extern const u8 LSB_64_table[154]; // bit number table
>#define LSB_64_adj -51 // offset to table base
>#define LSB_64_magic ( (u32)0x01C5FC81 ) // magic constant
>
>// ---------------------------------------------------------------------------
>// LSB_64() -- find, remove, report least-significant bit of 64.
>// Argument 'bb' must be non-null. Method: fold then table lookup.
>// Written by Walter Faxon, June 2002. No copyright. No warranty.
>//
>inline // inline declaration may differ by compiler
>u8 LSB_64( u64* bb )
> {
> u64 t64;
> u32 t32;
> t64 = *bb - 1;
> *bb &= t64; // omit this line to retain current LSB
> t64 ^= *bb;
> t32 = (u32)t64 ^ (u32)(t64 >> 32);
> t32 ^= LSB_64_magic;
> t32 += t32 >> 16;
> t32 -= t32 >> 8;
> return LSB_64_table [LSB_64_adj + (u8)t32];
> }
>
>// ---------------------------------------------------------------------------
>// Table reports number of low-order bit as 0, high-order as 63.
>// (Numbering can be reversed by changing this table.)
>// Important: arrange storage so that this table is kept in the cache.
>const u8 LSB_64_table[154] =
> {
>#define __ 0
> 23,__,__,__,31,__,__,39,19,__, 17,16,18,__,47,10,20, 9, 8,11,
> 1, 0, 2,57,56,58, 3,12,__,59, __,__,21,__, 4,__,__,60,__,__,
> __,__,__,13,__,__,__,__,__,__, 5,__,__,61,__,__,__,__,__,__,
> __,__,__,__,22,__,__,__,30,__, __,38,__,__,__,14,__,__,46,__,
> __,__, 6,__,__,62,__,__,__,54, __,__,__,__,__,__,__,__,__,__,
> 29,__,__,37,__,__,__,__,__,__, 45,__,__,__,__,__,28,__,__,36,
> __,53,__,__,27,__,44,35,26,24, 25,34,32,33,43,40,41,52,42,15,
> __,50,48,49,__,51, 7,__,__,63, __,__,__,55
>#undef __
> };
>
>//eof
>
>P.S. You can even avoid the table lookup if you're willing to deal with
>scrambled square indecies in the range 0 to 153.
>
>P.P.S. I feel a little like the unlucky scientist whose results were lost
>because they were only published _four_ times... :)
Geniously, sorry Walter, that i overlooked your approach. It is clearly the
fastest of these routines. De Bruijn Sequence? May have a close look to the
constant.
I'm really impressed.
Regards,
Gerd
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.