Author: Tord Romstad
Date: 02:41:03 05/10/05
Go up one level in this thread
On May 09, 2005 at 07:48:51, Steven Edwards wrote: >Using the above code vs. the table lookup model results in a speed increase of >about 16% when running movepath enumeration (i.e., perft). Overall speedup is >somewhat less although still significant. The cntlzw and cntlzdw instructions are certainly worth a look for bitboarders who run their engines on PowerPC CPUs, but comparing the speed to the table lookup method is not very interesting. I've found that table lookup is the slowest of all the common bit scanning techniques, on the G4 as well as the G5. I use the deBruijn multiplication trick: const uint32 BitTable[64] = { 0,1,2,7,3,21,16,35,4,49,22,52,17,66,36,80,5,33,50,70,23,86,53,96,18,55,67, 102,37,98,81,113,119,6,20,34,48,51,65,71,32,69,85,87,54,101,97,112,118,19, 39,64,68,84,100,103,117,38,83,99,116,82,115,114 }; inline unsigned first_1(bitboard_t b) { return BitTable[((b&-b)*0x218a392cd3d5dbfULL)>>58]; } I no longer remember exactly how big the difference in speed between this and the cntlzdw instruction was, but I remember that it was so tiny that there was no point in using inline assembly language. As always, YMMV. Tord
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.