Author: Matt Taylor
Date: 09:20:19 12/21/02
Go up one level in this thread
On December 21, 2002 at 05:16:18, Walter Faxon wrote:
>On December 21, 2002 at 02:40:42, Matt Taylor wrote:
>
>>OK, after a bit of work cleaning up an undergrad project (the buggy source
>>highlighter that I used to convert the code -> html), I've got the code up.
>>
>>Timings:
>>http://65.186.75.165/~mtaylor/bitscan_athlon.html
>>
>>Code:
>>http://65.186.75.165/~mtaylor/ipcbench.c.html
>>http://65.186.75.165/~mtaylor/bitscan.c.html
>>
>>I'll run results for an original K6 tomorrow just as a matter of curiousity. I
>>don't have access to -ANY- Intel machines right now, so I can't get Intel data.
>>If anyone wants to help out, I can toss the binary up on FTP. Offhand I don't
>>think it will run terribly well on a P4 and possibly not very well on previous
>>Intel processors.
>>
>>-Matt
>
>
>Matt -- very interesting!
>
>One possible improvement that doesn't translate to pure C, is the note in the
>following:
>
>// ---------------------------------------------------------------------------
>// LSB_64_r1() -- find, remove, report least-significant bit of 64.
>// Argument 'bb' must be non-null. Method: fold then table lookup.
>// Written by Walter Faxon, June 2002. No copyright. No warranty.
>//
>inline // inline declaration may differ by compiler
>u8 LSB_64_r1( u64* bb )
> {
> u64 t64;
> u32 t32;
> u8 t8;
> t64 = *bb - 1;
> //
> // NOTE: at this point, normal assembly code would leave the
> // carry flag set when the input (bb == 0). If a "break" or
> // "goto" on that flag is possible, one could for example, loop
> // without any leading test for (bb != 0).
> //
> *bb &= t64; // comment here earlier was artifact
> t64 ^= *bb;
> t32 = (u32)t64 ^ (u32)(t64 >> 32);
> t32 ^= LSB_64_magic;
> t32 += t32 >> 16;
> t8 = (u8)t32 - (u8)(t32 >> 8); // regs addressable by byte
> return LSB_64_table [LSB_64_adj + t8];
> }
Very true about the conditional. I think that would favor heavier workloads;
otherwise you'll do some needless arithmetic. I'll see about implementing that
in my integer version.
>Of course, it might also help in some other code.
>
>Question: would using a pure-C routine to remove/report a bit in a 32-bit word
>just be better? It would require an additional loop, so at least one additional
>variable, so I thought it best to skip it. But you wouldn't have to update both
>halves each run through. That'd save some time and registers, at a premium
>here. I can post the 32-bit code and table if you like.
>
>Or maybe we're just flogging this one little function to death... :)
>
>-- Walter
No, please do post the 32-bit version! I would expect it to be faster on our
existing chips. On a 64-bit chip, your 64-bit version routine will probably be
fastest.
-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.