Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A data point for PowerPC bitboard program authors

Author: Robert Hyatt

Date: 07:01:53 05/09/05

Go up one level in this thread


On May 09, 2005 at 07:48:51, Steven Edwards wrote:

>The PowerPC architecture has an instruction for counting the leading zero bits
>in a word.  This is either a 32 bit or a 64 bit operation depening on the CPU
>mdel and mode in use.  Naturally, this instruction can be helpful for optimizing
>the FirstSq and NextSq functions.  Unfortunately, there is no PowerPC
>instruction for bit counting.
>
>Here's the sample wrapper for the leading zero bit counter from a GCC header
>file:
>
>static inline int __cntlzw (int value) __attribute__((always_inline));
>static inline int
>__cntlzw (int value)
>{
>  long result;
>  __asm__ ("cntlzw %0, %1"
>           /* outputs:  */ : "=r" (result)
>           /* inputs:   */ : "r" (value));
>  return result;
>}
>
>The actual code for FirstSq/NextSq will depend on the specific bit/square
>correspondance.  For Symbolic's toolkit, the CTBB::NextSq() member function is:
>
>#if (CTHostMac && CTArchBits32 && CTAllowAssembly)
>  CTSq NextSq(void)
>  {
>    int theZC = __cntlzw(myDwrdVec[0]);
>
>    if (theZC != 32)
>    {
>      myDwrdVec[0] ^= 1 << (31 - theZC);
>      return (CTSq) (theZC ^ 0x07);
>    }
>    else
>    {
>      theZC = __cntlzw(myDwrdVec[1]);
>      if (theZC != 32)
>      {
>        myDwrdVec[1] ^= 1 << (31 - theZC);
>        return (CTSq) ((theZC ^ 0x07) + 32);
>      }
>      else
>        return CTSqNil;
>    };
>  }
>#endif
>
>Using the above code vs. the table lookup model results in a speed increase of
>about 16% when running movepath enumeration (i.e., perft).  Overall speedup is
>somewhat less although still significant.
>
>The above "(theZC ^ 0x07)" operations could be removed if the bit/square
>correspondance was altered (reversed) to have a-file squares appear in the MSbit
>instead of the LSBit.  I believe this is how Crafty arranges the bits, but I
>haven't tested it.  Currently, Crafty does not have assemply language level
>optimizations for PowerPC.  Possibly some ambitious author could contribute in
>that area.


Some non-standard Crafty versions uses the PPC stuff thru an intrinsic.  The
normal versions do not, but someone wrote the changes for their PPC box.

Yes, a0 -> MSB, because the PPC uses the same approach as the Cray, which rather
than finding the first one bit, counts leading zero bits instead.  I'm going to
change this one day to get rid of the 63-X stuff necessary because of the way
the X86 BSF/BSR count bits from the other end...





This page took 0.03 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.