Author: Steven Edwards
Date: 04:48:51 05/09/05
The PowerPC architecture has an instruction for counting the leading zero bits in a word. This is either a 32 bit or a 64 bit operation depening on the CPU mdel and mode in use. Naturally, this instruction can be helpful for optimizing the FirstSq and NextSq functions. Unfortunately, there is no PowerPC instruction for bit counting. Here's the sample wrapper for the leading zero bit counter from a GCC header file: static inline int __cntlzw (int value) __attribute__((always_inline)); static inline int __cntlzw (int value) { long result; __asm__ ("cntlzw %0, %1" /* outputs: */ : "=r" (result) /* inputs: */ : "r" (value)); return result; } The actual code for FirstSq/NextSq will depend on the specific bit/square correspondance. For Symbolic's toolkit, the CTBB::NextSq() member function is: #if (CTHostMac && CTArchBits32 && CTAllowAssembly) CTSq NextSq(void) { int theZC = __cntlzw(myDwrdVec[0]); if (theZC != 32) { myDwrdVec[0] ^= 1 << (31 - theZC); return (CTSq) (theZC ^ 0x07); } else { theZC = __cntlzw(myDwrdVec[1]); if (theZC != 32) { myDwrdVec[1] ^= 1 << (31 - theZC); return (CTSq) ((theZC ^ 0x07) + 32); } else return CTSqNil; }; } #endif Using the above code vs. the table lookup model results in a speed increase of about 16% when running movepath enumeration (i.e., perft). Overall speedup is somewhat less although still significant. The above "(theZC ^ 0x07)" operations could be removed if the bit/square correspondance was altered (reversed) to have a-file squares appear in the MSbit instead of the LSBit. I believe this is how Crafty arranges the bits, but I haven't tested it. Currently, Crafty does not have assemply language level optimizations for PowerPC. Possibly some ambitious author could contribute in that area.
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.