Author: Eugene Nalimov
Date: 16:24:04 01/05/99
Go up one level in this thread
On January 05, 1999 at 01:25:46, Dann Corbit wrote:
>I would be curious to see timings of the assembly language variants versus this
>simple C doo-dad:
>#include <limits.h>
>#include <stdlib.h>
>#if CHAR_BIT == 8
>static const char bits[256] =
>{
> 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
> 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
> 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
> 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
> 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
>};
>#else
>PLEASE FIX ME.
>#endif
>
>/*
> ** Count bits in each byte
> **
> ** by Auke Reitsma
> **
> ** Torqued by D. Corbit
> ** This version makes no assumptions about integer size.
> ** If CHAR_BIT is not equal to 8, you will have to provide
> ** a corrected table (see above).
> */
>
>int bit_count_bytes(unsigned long x)
>{
> unsigned char * Ptr = (unsigned char *) &x;
> int Accu;
> switch (sizeof(x))
> {
> case 4:
> Accu = bits[Ptr[0]] + bits[Ptr[1]] + bits[Ptr[2]] + bits[Ptr[3]];
> break;
> case 8:
> Accu = bits[Ptr[0]] + bits[Ptr[1]] + bits[Ptr[2]] + bits[Ptr[3]] +
> bits[Ptr[4]] + bits[Ptr[5]] + bits[Ptr[6]] + bits[Ptr[7]];
> break;
> default:
> {
> size_t i;
> Accu = 0;
> for (i = 0; i < sizeof(int); i++)
> Accu += bits[Ptr[i]];
> }
> }
> return Accu;
>}
Slightly modified routine, so it works for 8-bytes __int64, not for
4-bytes integers. Test input is 70 __int64 integers with 0-2 bytes
set. VC++ 6.0, PPro/200, NT4.0.
Both routines inlined: assembly routine is 3.3 times faster.
Both routines are non-inlined: assembly routine is 2.6 times faster.
Eugene
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.