Author: Steven Edwards
Date: 04:23:19 05/10/05
Go up one level in this thread
On May 10, 2005 at 05:41:03, Tord Romstad wrote: >The cntlzw and cntlzdw instructions are certainly worth a look for >bitboarders who run their engines on PowerPC CPUs, but comparing the speed to >the table lookup method is not very interesting. I've found that table lookup >is the slowest of all the common bit scanning techniques, on the G4 as well as >the G5. > >I use the deBruijn multiplication trick: > >const uint32 BitTable[64] = { > 0,1,2,7,3,21,16,35,4,49,22,52,17,66,36,80,5,33,50,70,23,86,53,96,18,55,67, > 102,37,98,81,113,119,6,20,34,48,51,65,71,32,69,85,87,54,101,97,112,118,19, > 39,64,68,84,100,103,117,38,83,99,116,82,115,114 >}; > >inline unsigned first_1(bitboard_t b) { > return BitTable[((b&-b)*0x218a392cd3d5dbfULL)>>58]; >} > >I no longer remember exactly how big the difference in speed between this >and the cntlzdw instruction was, but I remember that it was so tiny that >there was no point in using inline assembly language. As always, YMMV. Perhaps on a G5, but for the 32 bit G4 the above four 64 operations [-, &, *, >>] have to be split up (the multiply in particular) and with the table reference added, it doesn't look that good. Also, the above does not map to the -1 + 0..63 which I need. Maybe you are using a 10x12 board like I first saw in the 1978 Sargon.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.