Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A data point for PowerPC bitboard program authors

Author: Tord Romstad

Date: 05:19:47 05/10/05

Go up one level in this thread


On May 10, 2005 at 07:23:19, Steven Edwards wrote:

>On May 10, 2005 at 05:41:03, Tord Romstad wrote:
>
>>The cntlzw and cntlzdw instructions are certainly worth a look for
>>bitboarders who run their engines on PowerPC CPUs, but comparing the speed to
>>the table lookup method is not very interesting.  I've found that table lookup
>>is the slowest of all the common bit scanning techniques, on the G4 as well as
>>the G5.
>>
>>I use the deBruijn multiplication trick:
>>
>>const uint32 BitTable[64] = {
>>  0,1,2,7,3,21,16,35,4,49,22,52,17,66,36,80,5,33,50,70,23,86,53,96,18,55,67,
>>  102,37,98,81,113,119,6,20,34,48,51,65,71,32,69,85,87,54,101,97,112,118,19,
>>  39,64,68,84,100,103,117,38,83,99,116,82,115,114
>>};
>>
>>inline unsigned first_1(bitboard_t b) {
>>  return BitTable[((b&-b)*0x218a392cd3d5dbfULL)>>58];
>>}
>>
>>I no longer remember exactly how big the difference in speed between this
>>and the cntlzdw instruction was, but I remember that it was so tiny that
>>there was no point in using inline assembly language.  As always, YMMV.
>
>Perhaps on a G5, but for the 32 bit G4 the above four 64 operations [-, &, *,
>>>] have to be split up (the multiply in particular) and with the table
>reference added, it doesn't look that good.

I think I recall that the deBruijn trick was rather fast on the G4 as well,
but I am not entirely sure.  Dieter once posted a slightly more complicated
version optimized for 32-bit CPUs.  It can probably still be found somewhere
in the archives.

>Also, the above does not map to the
>-1 + 0..63 which I need.  Maybe you are using a 10x12 board like I first saw in
>the 1978 Sargon.

Yes, I forgot that.  I use a 16x16 board.  In order to make my code produce
bit indices in the range 0..63, replace each entry x in BitTable[] with
(x % 16) + (x / 16) * 8.

This is, by the way, one of the reasons I prefer deBruijn sequences or
straightforward table lookups rather than cntlzdw and the like.  The
table-based methods are more versatile in the sense that they allow me
to keep whatever size and shape I prefer for my board array.  When using
cntlzdw with a board array of more than 64 entries, you will have to do
an extra table lookup or a few additional arithmetic operations in order
to get bit indices in the correct format.  This would probably be enough
to offset the performance advantage of inline assembly language.

Tord



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.