Author: Eugene Nalimov
Date: 16:24:04 01/05/99
Go up one level in this thread
On January 05, 1999 at 01:25:46, Dann Corbit wrote: >I would be curious to see timings of the assembly language variants versus this >simple C doo-dad: >#include <limits.h> >#include <stdlib.h> >#if CHAR_BIT == 8 >static const char bits[256] = >{ > 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, > 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, > 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, > 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, > 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, > 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, > 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, > 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, > 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, > 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, > 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8 >}; >#else >PLEASE FIX ME. >#endif > >/* > ** Count bits in each byte > ** > ** by Auke Reitsma > ** > ** Torqued by D. Corbit > ** This version makes no assumptions about integer size. > ** If CHAR_BIT is not equal to 8, you will have to provide > ** a corrected table (see above). > */ > >int bit_count_bytes(unsigned long x) >{ > unsigned char * Ptr = (unsigned char *) &x; > int Accu; > switch (sizeof(x)) > { > case 4: > Accu = bits[Ptr[0]] + bits[Ptr[1]] + bits[Ptr[2]] + bits[Ptr[3]]; > break; > case 8: > Accu = bits[Ptr[0]] + bits[Ptr[1]] + bits[Ptr[2]] + bits[Ptr[3]] + > bits[Ptr[4]] + bits[Ptr[5]] + bits[Ptr[6]] + bits[Ptr[7]]; > break; > default: > { > size_t i; > Accu = 0; > for (i = 0; i < sizeof(int); i++) > Accu += bits[Ptr[i]]; > } > } > return Accu; >} Slightly modified routine, so it works for 8-bytes __int64, not for 4-bytes integers. Test input is 70 __int64 integers with 0-2 bytes set. VC++ 6.0, PPro/200, NT4.0. Both routines inlined: assembly routine is 3.3 times faster. Both routines are non-inlined: assembly routine is 2.6 times faster. Eugene
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.