Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Beating Bitscan to Death -- yet again

Author: Walter Faxon

Date: 15:12:29 12/30/02

Go up one level in this thread


On December 24, 2002 at 23:28:27, Matt Taylor wrote:

>http://65.186.75.165/~mtaylor/bitscan/bitscan.html
>
<snip>
>
>-Matt


Hi, Matt.

Well, here is Yet Another Bit Scanner (yabs), this time based on my 32-bit
folding code.  Logically this one shouldn't require use of more than three
general purpose registers, and thus it should be about as fast as anything I've
posted before.  Just for fun, I tried to be super-pedantic and thus guide any
reasonable late-Pentium compiler into producing near-optimal code.
Consequently, you should be able to hand-translate this into assembler in about
3 minutes, ignoring register choices and scheduling issues.

I tried it both as an inlined routine and as a macro (not given here) which
would directly break out of a bit scanning loop without a surrounding test.

Trying so hard turned out to be a mistake.  Unfortunately, the compilers I use
(a slightly dated gcc and MetaWare's High-C, both well-recommended and run at
their highest optimization settings) have proven to be execrable pupils.  I
could not have imagined such awful assembly code.  I won't burden you or this
list with examples.

Maybe your compilers are really much better.  But you yourself have complained
about some of the stupid choices they make.

My goal was a fast bit-scanner in C.  I was willing to be slightly pedantic for
a performance increment.  But if I have to write in assembler anyway, with all
of its issues, why bother?

I'm bummed.

-- Walter

IMPORTANT:  Bulk code follows.  Responders should <snip> it.
// ---------------------------------------------------------------------------
typedef unsigned long long u64;                 // nonstandard
typedef unsigned long      u32;
typedef unsigned char      u8;

extern const u8 LSB_32_table[134];              // bit number table
#define LSB_32_magic  ( (u32)0x05407DB7 )       // magic constant

// ---------------------------------------------------------------------------
// yabs() -- Yet Another Bit Scanner:  find, remove, report, least
// significant (set) bit (LSB) found in a 64-bit bitboard.
// The low-order bit is numbered 0, the high-order numbered 63.
// Returns -1 on null argument (see "===>").
// Ultra-pedantic C code for late x86 processors (having CMOVE) with
// good compilers; should only require 3 32-bit GPRs.
// Written December 2002 by Walter Faxon.  No copyright.  No warranty.
//
inline                              // force inlining; differs by compiler
int yabs( u64* bitboard_p )
    {
    u32* bbp = (u32*)bitboard_p;    // rename bitboard parameter (no code)
    register u32 bits;              // active half of board
    register union                  // single LSB and folding
        {
        u32 dword;
        struct {
            u8 low, high;
            } bytes;
        } bit;
    register union                  // index of active half
        {
        u32 dword;
        u8 byte;
        } index;

    bits = bbp[0];                  // get low half
    bit.dword = bbp[1];             // and high half
    index.dword = 0;
    index.byte = (bits == 0);       // index of active half (SETCC)
    if (bits == 0)                  // if low exhausted,
        bits = bit.dword;           // use high (CMOVE, no branch)

    bit.dword = -bits;              // get lone LSB, if any...
    // ===> Omit test and branch when input guaranteed non-null. <===
    if (bit.dword == 0)
        return -1;                  // if none, finished
        // ===> If converted to macro in loop, use break instead. <===
    bit.dword &= bits;              // ...got lone LSB

    bits ^= bit.dword;              // remove LSB from active half
    bbp[index.dword] = bits;        // update changed half of bitboard

    index.byte <<= 5;               // (or index.dword); index now {0,32}

    // folding/lookup process
    bit.dword--;
    bit.dword ^= LSB_32_magic;
    bit.dword += bit.dword >> 16;
    bit.bytes.low -= bit.bytes.high;        // regs addressable by byte
    bit.dword &= (u32)0x000000FF;           // make ready for indexing use
    index.byte += LSB_32_table[bit.dword];  // {0,32} + {0..31} ==> {0..63}

    return (int)index.dword;
    }

// ---------------------------------------------------------------------------
// Table reports number of low-order bit as 0, high-order as 31.
// Important:  arrange storage so that this table is kept in the cache.
const u8 LSB_32_table[134] =
    {
#define __  -1
    23,16,17,__,18,10, 8, 9,19,11, 31,__,__,__,__,__,20,12,__,__,
    __,__,__,__,__,__,__,__,__,__, __,__,21,13,__,__,__,__,__,__,
    __,__,__,__,__,__,__,__,__,__, __,__,__,__,__,__,__,__,__,__,
    __,__,__,__,22,14,__,__,__,__,  6,__,__,__,30,__,__,__,__,__,
    __,__,__,__,__,__,__,__,__,__, __,__,__,__,__,__,__,__,__,__,
    __,__, 5,__,__,__,29,__,__,__,  3,__,__,__, 2,__, 1, 0, 4,__,
    __,__,28,__,__,__,26,24,25,15, 27,__,__, 7
#undef __
    };

// eof



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.