Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bit Scan Timings

Author: Matt Taylor

Date: 09:20:19 12/21/02

Go up one level in this thread


On December 21, 2002 at 05:16:18, Walter Faxon wrote:

>On December 21, 2002 at 02:40:42, Matt Taylor wrote:
>
>>OK, after a bit of work cleaning up an undergrad project (the buggy source
>>highlighter that I used to convert the code -> html), I've got the code up.
>>
>>Timings:
>>http://65.186.75.165/~mtaylor/bitscan_athlon.html
>>
>>Code:
>>http://65.186.75.165/~mtaylor/ipcbench.c.html
>>http://65.186.75.165/~mtaylor/bitscan.c.html
>>
>>I'll run results for an original K6 tomorrow just as a matter of curiousity. I
>>don't have access to -ANY- Intel machines right now, so I can't get Intel data.
>>If anyone wants to help out, I can toss the binary up on FTP. Offhand I don't
>>think it will run terribly well on a P4 and possibly not very well on previous
>>Intel processors.
>>
>>-Matt
>
>
>Matt -- very interesting!
>
>One possible improvement that doesn't translate to pure C, is the note in the
>following:
>
>// ---------------------------------------------------------------------------
>// LSB_64_r1() -- find, remove, report least-significant bit of 64.
>// Argument 'bb' must be non-null.  Method:  fold then table lookup.
>// Written by Walter Faxon, June 2002.  No copyright.  No warranty.
>//
>inline                  // inline declaration may differ by compiler
>u8 LSB_64_r1( u64* bb )
>    {
>    u64 t64;
>    u32 t32;
>    u8 t8;
>    t64 = *bb - 1;
>    //
>    // NOTE:  at this point, normal assembly code would leave the
>    // carry flag set when the input (bb == 0).  If a "break" or
>    // "goto" on that flag is possible, one could for example, loop
>    // without any leading test for (bb != 0).
>    //
>    *bb &= t64;         // comment here earlier was artifact
>    t64 ^= *bb;
>    t32 = (u32)t64 ^ (u32)(t64 >> 32);
>    t32 ^= LSB_64_magic;
>    t32 += t32 >> 16;
>    t8 = (u8)t32 - (u8)(t32 >> 8);      // regs addressable by byte
>    return LSB_64_table [LSB_64_adj + t8];
>    }

Very true about the conditional. I think that would favor heavier workloads;
otherwise you'll do some needless arithmetic. I'll see about implementing that
in my integer version.

>Of course, it might also help in some other code.
>
>Question:  would using a pure-C routine to remove/report a bit in a 32-bit word
>just be better?  It would require an additional loop, so at least one additional
>variable, so I thought it best to skip it.  But you wouldn't have to update both
>halves each run through.  That'd save some time and registers, at a premium
>here.  I can post the 32-bit code and table if you like.
>
>Or maybe we're just flogging this one little function to death... :)
>
>-- Walter

No, please do post the 32-bit version! I would expect it to be faster on our
existing chips. On a 64-bit chip, your 64-bit version routine will probably be
fastest.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.