Author: Gerd Isenberg
Date: 13:09:24 11/27/03
Go up one level in this thread
On November 27, 2003 at 13:48:22, Robert Hyatt wrote:
>On November 27, 2003 at 05:29:43, Gerd Isenberg wrote:
>
>>On November 26, 2003 at 21:18:05, Robert Hyatt wrote:
>>
>>>I have been converting the X86.s file to work in 64 bit mode on the Opteron
>>>system I am playing with. And I must say that after studying the Opteron
>>>64 bit instruction set, I'm impressed.
>>>
>>>First, all the old opcodes work.. mov, sub, bsf, etc..
>>>
>>>Second, the familiar 8 32-bit regs are still there. But they can be
>>>named %rax rather than %eax to stretch them to 64 bits. Cute. And
>>>then there are 8 more registers you can use with the same old opcodes
>>>and addressing modes.
>>>
>>>In short, it's well-thought-out and very easy to use. I'll post some
>>>performance later. I have PopCnt(), FirstOne() and LastOne() working
>>>fine. After I finish the others, I'll see how much (if any) it speeds
>>>things up.
>>
>>Hi Bob,
>>
>>Very curious about your 64-bit results...
>
>First results are not good. FirstOne() and LastOne() in asm are actually
>slower than the normal table-lookup in the C source. I would suspect it has
>to do with (a) bigger cache; (b) bsf/bsr are not fast; (c) the C can be
>inlined while the asm is coded as external procedures...
>
>>
>>I see good chances for Matt Taylor's de Bruijn multiplication to become faster
>>than a single bsf on AMD64, because bsf reg64 is still 9-cycle vector path
>>instruction, but 32*32=64bit or 64*64=128bit became direct path intructions on
>>AMD64 (3/5 cycles). May be you can try it some day...
>
>If you have anything that runs under linux, I can run it on this box easily.
>
Matt Taylor's folding trick with 32-bit de Bruijn multiplication,
or 64-bit deBruijn multiplication. I guess the latter is faster for 64-bit.
A small table lookup is required, to get "real" bit indices.
// Matt Taylor lsb
inline int lsb(BitBoard bb)
{ // leave lsb unchanged, reset all upper, but set all lower bits
bb ^= (bb - 1);
unsigned int foldedLSB = ((int)bb)^((int)(bb>>32));
return lsz64tbl[(foldedLSB * 0x78291ACF) >> (32-6)];
}
// Matt Taylor lsb with reset found bit
inline int bitScanAndReset(BitBoard &bb)
{
BitBoard lsb = bb ^ (bb - 1);
bb &= ~lsb;
unsigned int foldedLSB = ((int) lsb) ^ ((int)(lsb>>32));
return lsz64tbl[(foldedLSB * 0x78291ACF) >> (32-6)];
}
const int lsz64tbl[64] =
{
63,30, 3,32,59,14,11,33,
60,24,50, 9,55,19,21,34,
61,29, 2,53,51,23,41,18,
56,28, 1,43,46,27, 0,35,
62,31,58, 4, 5,49,54, 6,
15,52,12,40, 7,42,45,16,
25,57,48,13,10,39, 8,44,
20,47,38,22,17,37,36,26,
};
64-bit deBruijn multiplication:
static const BitBoard deBruijn = 0x03f08c5392cd5dbd;
// there are a lot others with unique 64 * 6 bit sequences of course
inline int lsb(BitBoard bb)
{
bb &= -bb; // isolate lsb
return lsb64tbl[(bb * deBruijn) >> (64-6)]; // unsigned shift!
}
const int lsb64tbl[64] =
{
0, 1,12, 2,13,22,17, 3,
14,33,23,36,18,42,28, 4,
62,15,34,26,24,46,37,48,
19,39,43,54,29,50,57, 5,
63,11,21,16,32,35,41,27,
61,25,45,47,38,53,49,56,
10,20,31,40,60,44,52,55,
9,30,59,51, 8,58, 7, 6,
};
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.