Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: An Opteron note

Author: Gerd Isenberg

Date: 13:09:24 11/27/03

Go up one level in this thread


On November 27, 2003 at 13:48:22, Robert Hyatt wrote:

>On November 27, 2003 at 05:29:43, Gerd Isenberg wrote:
>
>>On November 26, 2003 at 21:18:05, Robert Hyatt wrote:
>>
>>>I have been converting the X86.s file to work in 64 bit mode on the Opteron
>>>system I am playing with.  And I must say that after studying the Opteron
>>>64 bit instruction set, I'm impressed.
>>>
>>>First, all the old opcodes work..   mov, sub, bsf, etc..
>>>
>>>Second, the familiar 8 32-bit regs are still there.  But they can be
>>>named %rax rather than %eax to stretch them to 64 bits.  Cute.  And
>>>then there are 8 more registers you can use with the same old opcodes
>>>and addressing modes.
>>>
>>>In short, it's well-thought-out and very easy to use.  I'll post some
>>>performance later.  I have PopCnt(), FirstOne() and LastOne() working
>>>fine.  After I finish the others, I'll see how much (if any) it speeds
>>>things up.
>>
>>Hi Bob,
>>
>>Very curious about your 64-bit results...
>
>First results are not good.  FirstOne() and LastOne() in asm are actually
>slower than the normal table-lookup in the C source.  I would suspect it has
>to do with (a) bigger cache; (b) bsf/bsr are not fast; (c) the C can be
>inlined while the asm is coded as external procedures...
>
>>
>>I see good chances for Matt Taylor's de Bruijn multiplication to become faster
>>than a single bsf on AMD64, because bsf reg64 is still 9-cycle vector path
>>instruction, but 32*32=64bit or 64*64=128bit became direct path intructions on
>>AMD64 (3/5 cycles). May be you can try it some day...
>
>If you have anything that runs under linux, I can run it on this box easily.
>



Matt Taylor's folding trick with 32-bit de Bruijn multiplication,
or 64-bit deBruijn multiplication. I guess the latter is faster for 64-bit.
A small table lookup is required, to get "real" bit indices.


// Matt Taylor lsb
inline int lsb(BitBoard bb)
{   // leave lsb unchanged, reset all upper, but set all lower bits
    bb ^= (bb - 1);
    unsigned int foldedLSB = ((int)bb)^((int)(bb>>32));
    return lsz64tbl[(foldedLSB * 0x78291ACF) >> (32-6)];
}

// Matt Taylor lsb with reset found bit
inline int bitScanAndReset(BitBoard &bb)
{
    BitBoard lsb = bb ^ (bb - 1);
    bb &= ~lsb;
    unsigned int foldedLSB = ((int) lsb) ^ ((int)(lsb>>32));
    return lsz64tbl[(foldedLSB * 0x78291ACF) >> (32-6)];
}

const int lsz64tbl[64] =
{
 63,30, 3,32,59,14,11,33,
 60,24,50, 9,55,19,21,34,
 61,29, 2,53,51,23,41,18,
 56,28, 1,43,46,27, 0,35,
 62,31,58, 4, 5,49,54, 6,
 15,52,12,40, 7,42,45,16,
 25,57,48,13,10,39, 8,44,
 20,47,38,22,17,37,36,26,
};



64-bit deBruijn multiplication:

static const BitBoard deBruijn = 0x03f08c5392cd5dbd;
// there are a lot others with unique 64 * 6 bit sequences of course

inline int lsb(BitBoard bb)
{
    bb &= -bb; // isolate lsb
    return lsb64tbl[(bb * deBruijn) >> (64-6)]; // unsigned shift!
}

const int lsb64tbl[64] =
{
  0, 1,12, 2,13,22,17, 3,
 14,33,23,36,18,42,28, 4,
 62,15,34,26,24,46,37,48,
 19,39,43,54,29,50,57, 5,
 63,11,21,16,32,35,41,27,
 61,25,45,47,38,53,49,56,
 10,20,31,40,60,44,52,55,
  9,30,59,51, 8,58, 7, 6,
};






This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.