Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Question for the Crafty/Compiler experts

Author: Dieter Buerssner

Date: 11:31:58 02/17/04

Go up one level in this thread


On February 16, 2004 at 22:57:17, Robert Hyatt wrote:

>On February 16, 2004 at 18:25:41, Dieter Buerssner wrote:
>
>>I have seen this code many times. I also commented on it many times (never got
>>any answer to the point). Why would you want to code this in inline assembly? To
>>me it seems, equivalent C code (possibly micro optimized fór some environment,
>>but still more general than the assembly) will run faster (I showed examples
>>several times - IIRC including examples of what MSVC made of the C code and some
>>reasoning about not needed "mov ecx, dword ptr a" when using C).
>>
>>Regards,
>>Dieter
>
>I don't do the MSVC inline stuff, obviously, since I don't use windows and MSVC.
> However, the gcc inline generalized register facility seems more appropriate
>for inline asm to avoid register saves and restores.

Sure. Also, it gives the compiler more liberty to allocate registers (you don't
have to hard code the register names).

>I'm not sure the PopCnt()
>inline is any better than the pure C version in Crafty.

I doubt, the inline assembly is much faster (even with Gcc type inline
assembly). The following routine produces almost exactly the same code (with Gcc
and MSVC):

#ifdef __GNUC__
#define MYINLINE static __inline__
typedef unsigned long long BITBOARD;
#else
#define MYINLINE static __forceinline
typedef unsigned __int64 BITBOARD;
#endif

/* Note, this is not strictly portable, even not the pure C code*/
typedef union
{
  BITBOARD b;
  unsigned long w[2];
  /* unsigned char bytes[8]; */
} BU;


MYINLINE int dieter_popc(BU a)
{
  unsigned long w;
  int n = 0;
  w = a.w[0];
  if (w)
    do
      n++;
    while ((w &= w-1) != 0);
  w = a.w[1];
  if (w)
    do
      n++;
    while ((w &= w-1) != 0);
  return n;
}

The union is only there, to avoid some nasty casts. Using shifts and masks might
(depending on compiler and optimization options) create the same code (and would
be totally portable). Endianess is no issue for bitcount (doesn't matter which
is the high and low word).

BTW. The Popcount you used earlier in x86.S would probably also run almost as
fast when using corresponding C code. It missed some possible optimization, too.
For the last stages of the algorithm, the two 32-bit halves of the Bitboard can
be added without any problem, which saves quite a few instructions.

#define COMBINE_COUNTERS(x,m,s) x = (x&m) + ((x>>s)&m)

MYINLINE int bitfiddling(BU bb)
{
  unsigned long u, v;

  v = bb.w[1];
  u = bb.w[0];

  COMBINE_COUNTERS(u, 0x55555555, 1);
  COMBINE_COUNTERS(v, 0x55555555, 1);
  /* Depending on compiler, optimzation options, free registers,
     specific inlining situation, one or the other might be slightly faster. */
#if 0
  COMBINE_COUNTERS(u, 0x33333333, 2);
  COMBINE_COUNTERS(v, 0x33333333, 2);
  /* Now the counters have space to add the 2 parts
     without overflow */
  u += v;
#else
  u = (u & 0x33333333) + ((u >> 2) & 0x33333333)
       + (v & 0x33333333) + ((v >> 2) & 0x33333333);
#endif
  /* In earlier x86.s, the following 3 stages were done twice */
  COMBINE_COUNTERS(u, 0x0f0f0f0f, 4);
  /* return u%255; */ /* If you have very fast % */
  /* return (u * 0x01010101) >> 24; */ /* If you have very fast mul */
  /* Masks not needed for the last two steps. */
  /* COMBINE_COUNTERS(u, 0x00ff00ff, 8); */
  u += u >> 8;
  /* COMBINE_COUNTERS(u, 0x0000ffff, 16); */
  u += u >> 16;
  return u & 0xff;  /* 0x7f would be enough as mask. */
}

I bet, this is faster, than what you had in x86.s (neeeds 3 shifts, ands, movs,
adds (or leas) less). Gcc does not produce optimal assembly, but very reasonable
one - I think about 7 movs could be saved by hand coding. Also, it can be
inlined, which the stuff in x86.s can't. For sparse populated values, the loop
will probably be faster in many environments.

Regards,
Dieter




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.