Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A data point for PowerPC bitboard program authors

Author: Tony Werten

Date: 04:30:41 05/10/05

Go up one level in this thread


On May 10, 2005 at 07:23:19, Steven Edwards wrote:

>On May 10, 2005 at 05:41:03, Tord Romstad wrote:
>
>>The cntlzw and cntlzdw instructions are certainly worth a look for
>>bitboarders who run their engines on PowerPC CPUs, but comparing the speed to
>>the table lookup method is not very interesting.  I've found that table lookup
>>is the slowest of all the common bit scanning techniques, on the G4 as well as
>>the G5.
>>
>>I use the deBruijn multiplication trick:
>>
>>const uint32 BitTable[64] = {
>>  0,1,2,7,3,21,16,35,4,49,22,52,17,66,36,80,5,33,50,70,23,86,53,96,18,55,67,
>>  102,37,98,81,113,119,6,20,34,48,51,65,71,32,69,85,87,54,101,97,112,118,19,
>>  39,64,68,84,100,103,117,38,83,99,116,82,115,114
>>};
>>
>>inline unsigned first_1(bitboard_t b) {
>>  return BitTable[((b&-b)*0x218a392cd3d5dbfULL)>>58];
>>}
>>
>>I no longer remember exactly how big the difference in speed between this
>>and the cntlzdw instruction was, but I remember that it was so tiny that
>>there was no point in using inline assembly language.  As always, YMMV.
>
>Perhaps on a G5, but for the 32 bit G4 the above four 64 operations [-, &, *,
>>>] have to be split up (the multiply in particular) and with the table
>reference added, it doesn't look that good.  Also, the above does not map to the
>-1 + 0..63 which I need.  Maybe you are using a 10x12 board like I first saw in
>the 1978 Sargon.

No it's 0x88 (8*16)

Tony




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.