Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bit Scan Timings -- possible artifact?

Author: Matt Taylor

Date: 22:57:40 12/22/02

Go up one level in this thread


On December 23, 2002 at 01:42:31, Walter Faxon wrote:

>On December 22, 2002 at 20:10:45, Matt Taylor wrote:
>
>>snip<
>
>>I've added several tests using your 32-bit routine. It actually works a little
>>better, I think. (It also helps that the table is 20 bytes smaller -- but that's
>>mostly irrelevant.) I also added the "naive" method which looks roughly like
>>this:
>>
>>int shift_scan(bitboard bb)
>>{
>>    index = 0;
>>    while(bb != 0 && (bb & 1) == 0)
>>    {
>>        index++;
>>        bb >>= 1;
>>    }
>>
>>    return index;
>>}
>>
>>One other interesting possibility opens up due to using a 32-bit scan routine
>>like that: parallelism! The MMX routine performs poorly when applied to the
>>64-bit algorithm because it is difficult to fold at the end. I'm working on a
>>32-bit version. Initially it was -almost- the fastest bitscan yet, but then I
>>realized that I had a bug. I can do the bitscans in parallel, but I can't
>>actually clear both bits -- only the one I return. The current code ends up
>>throwing a bit away, cutting the time in half. After I fix the bug, we'll see
>>how it changes things.
>>
>>You know, it would also be interesting also to have a routine return two bits
>>and do some unrolling. It would make MMX invaluable as you could theoretically
>>grab 8 bits at a time.
>>
>>-Matt
>
>
>Hey, Matt.
>
>Well, as long as you're adding other test code, below is one more 32-bit
>routine, adapted from a post by Gerd last June (in fact, made super-pedantic).
>No table required.
>
>-- Walter
>
>Code follows:
>// ---------------------------------------------------------------------------
>// LSB_32x() -- find, remove, report least-significant bit of 32.
>// Argument must be non-null.  Method:  convert conditional tests to 0/1.
>// Based on 64-bit code posted to CCC by Gerd Isenberg, June 18, 2002.
>// Written by Walter Faxon, December 2002.  No copyright.  No warranty.
>//
>inline                      // may differ by compiler
>int
>LSB_32x( u32* value_p )
>    {
>    u32 copy, bit, index;
>
>    copy = *value_p;        // get 32-bit value
>    bit = -copy;            // get LSB as single bit..
>    //
>    // NOTE:  at this point, normal assembly code would leave the
>    // zero flag set when the input (*value_p == 0).  If a "break" or
>    // "goto" on that flag is possible, one could for example, loop
>    // without any leading test for (*value_p != 0).
>    //
>    bit &= copy;            // ..done
>    copy ^= bit;            // remove from value
>    *value_p = copy;        // and store back
>
>    // convert conditional tests to 0/1, building binary number
>    index  = ((bit & 0xFFFF0000) != 0);
>    index <<= 1;
>    index |= ((bit & 0xFF00FF00) != 0);
>    index <<= 1;
>    index |= ((bit & 0xF0F0F0F0) != 0);
>    index <<= 1;
>    index |= ((bit & 0xCCCCCCCC) != 0);
>    index <<= 1;
>    index |= ((bit & 0xAAAAAAAA) != 0);
>
>    return index;
>    }
>
>//eof

I'll include that one tomorrow as I'm getting sleepy. I've been debugging silly
bugs in the code for the past ... hours. I was excited when I saw a two-fold
speed improvement from the LSB-32. It was rather disappointing when I discovered
a bug that caused it to ignore other bits... sigh.

I added 4 new tests: my own hand-optimized version of your LSB-32, VC 7's
version, an MMX version, and the "naive" shift routine.

The MMX routine was the biggest headache out of this entire night. The numbers
came back and it was just short of being the fastest routine of the bunch. Then
I made a discovery: it was chomping bits in pairs (duh), and therefore it was
twice as fast as it should have been. Now that I've fixed that, it's even slower
than the 64-bit version. Sigh...

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.