Author: Walter Faxon
Date: 02:16:18 12/21/02
Go up one level in this thread
On December 21, 2002 at 02:40:42, Matt Taylor wrote:
>OK, after a bit of work cleaning up an undergrad project (the buggy source
>highlighter that I used to convert the code -> html), I've got the code up.
>
>Timings:
>http://65.186.75.165/~mtaylor/bitscan_athlon.html
>
>Code:
>http://65.186.75.165/~mtaylor/ipcbench.c.html
>http://65.186.75.165/~mtaylor/bitscan.c.html
>
>I'll run results for an original K6 tomorrow just as a matter of curiousity. I
>don't have access to -ANY- Intel machines right now, so I can't get Intel data.
>If anyone wants to help out, I can toss the binary up on FTP. Offhand I don't
>think it will run terribly well on a P4 and possibly not very well on previous
>Intel processors.
>
>-Matt
Matt -- very interesting!
One possible improvement that doesn't translate to pure C, is the note in the
following:
// ---------------------------------------------------------------------------
// LSB_64_r1() -- find, remove, report least-significant bit of 64.
// Argument 'bb' must be non-null. Method: fold then table lookup.
// Written by Walter Faxon, June 2002. No copyright. No warranty.
//
inline // inline declaration may differ by compiler
u8 LSB_64_r1( u64* bb )
{
u64 t64;
u32 t32;
u8 t8;
t64 = *bb - 1;
//
// NOTE: at this point, normal assembly code would leave the
// carry flag set when the input (bb == 0). If a "break" or
// "goto" on that flag is possible, one could for example, loop
// without any leading test for (bb != 0).
//
*bb &= t64; // comment here earlier was artifact
t64 ^= *bb;
t32 = (u32)t64 ^ (u32)(t64 >> 32);
t32 ^= LSB_64_magic;
t32 += t32 >> 16;
t8 = (u8)t32 - (u8)(t32 >> 8); // regs addressable by byte
return LSB_64_table [LSB_64_adj + t8];
}
Of course, it might also help in some other code.
Question: would using a pure-C routine to remove/report a bit in a 32-bit word
just be better? It would require an additional loop, so at least one additional
variable, so I thought it best to skip it. But you wouldn't have to update both
halves each run through. That'd save some time and registers, at a premium
here. I can post the 32-bit code and table if you like.
Or maybe we're just flogging this one little function to death... :)
-- Walter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.