Computer Chess Club Archives


Search

Terms

Messages

Subject: Hmm, same algorithm!

Author: Matt Taylor

Date: 03:44:45 01/22/03

Go up one level in this thread


On January 22, 2003 at 06:35:19, Matt Taylor wrote:

>On January 22, 2003 at 06:09:22, Dann Corbit wrote:
>
>>On January 22, 2003 at 05:33:20, Matt Taylor wrote:
>>
>>>On January 22, 2003 at 04:27:16, Dann Corbit wrote:
>>>
>>>>On January 22, 2003 at 03:29:05, Matt Taylor wrote:
>>>>
>>>>>On January 21, 2003 at 17:03:33, Dann Corbit wrote:
>>>>>
>>>>>>On January 21, 2003 at 15:48:57, Sander de Zoete wrote:
>>>>>>
>>>>>>>The following instruction I found for the new architecture of Intel Chips
>>>>>>>
>>>>>>>BSWAP—Byte Swap
>>>>>>>
>>>>>>>Description
>>>>>>>Reverses the byte order of a 32-bit (destination) register:
>>>>>>>
>>>>>>>Operation
>>>>>>>TEMP ¬ DEST
>>>>>>>DEST(7..0) ¬ TEMP(31..24)
>>>>>>>DEST(15..8) ¬ TEMP(23..16)
>>>>>>>DEST(23..16) ¬ TEMP(15..8)
>>>>>>>DEST(31..24) ¬ TEMP(7..0)
>>>>>>>Flags Affected
>>>>>>>None.
>>>>>>>Opcode Instruction Description
>>>>>>>0F C8+ rd BSWAP r32 Reverses the byte order of a 32-bit register.
>>>>>>>
>>>>>>>It is only valid for 486 architecture.
>>>>>>>
>>>>>>>If this instruction can be used, it should be very easy to reverse Reverse
>>>>>>>BitBoards into Forward Boards again. Saving a lot of updating in make and unmake
>>>>>>>move and also be much faster using the bitboards for generating attack
>>>>>>>board for evaluation purposes, incheck checks etc.
>>>>>>>
>>>>>>>The thing left to do is to reverse the bits per byte.
>>>>>>>
>>>>>>>I use the following trick, but it doesn't seem to work voor the value 9. Let me
>>>>>>>show you what I mean: (and ofcourse, can you help me?)
>>>>>>>
>>>>>>>In C code, it looks like this:
>>>>>>>
>>>>>>>typedef unsigned __int64 BITBOARD;
>>>>>>>
>>>>>>>void ReverseBitsPerByte(BITBOARD bitboard)
>>>>>>>{
>>>>>>>//  1.  Load the constant, k = 5555555555555555h
>>>>>>>BITBOARD k = 0x5555555555555555;
>>>>>>>
>>>>>>>//  2.  x = [(x shl 1) and k] or [(x and k) shr 1] result is: EFCDAB8967452301
>>>>>>>bitboard = ((bitboard<<1) & k) | ((bitboard & k)>>1 );
>>>>>>>}
>>>>>>>
>>>>>>>Initial bitboard:
>>>>>>>11111110
>>>>>>>11011100
>>>>>>>10111010
>>>>>>>10011000
>>>>>>>01110110
>>>>>>>01010100
>>>>>>>00110010
>>>>>>>00010000
>>>>>>>
>>>>>>>Result of ReverseBitsPerByte(bitboard):
>>>>>>>01111111
>>>>>>>00111011
>>>>>>>01011101
>>>>>>>00011000<--what the ^*&* goes wrong here? Should be 00011001.
>>>>>>>01101110
>>>>>>>00101010
>>>>>>>01001100
>>>>>>>00001000
>>>>>>>
>>>>>>>Thanks for any help or suggestions.
>>>>>>
>>>>><snip code>
>>>>>
>>>>>These are all basically the same algorithm. It's going to take a while to
>>>>>rearrange everything when you do it bit-by-bit. Sander has the right idea by
>>>>>first swapping bytes and then trying to reverse the bits within the bytes (8*2
>>>>>ops + the swap as opposed to 64*2 ops).
>>>>
>>>>The algorithms don't perform identically.
>>>>They don't require 64 ops.
>>>>Once you get the "Sander" algorithm completed, compare the two (and also the
>>>>COBRA algorithm).
>>>
>>>Yes, breverse5 is a divide-and-conquer algorithm. The others all iterate 64
>>>times. Assuming 2 ops/iteration was a gross underestimate as they use heavy
>>>masking and shifting.
>>>
>>>2 bswaps (and probably a mov, too) will reverse the bytes; afterward, the bits
>>>can be reversed as well.
>>>
>>>Not sure when or if I can have a look at COBRA and think about the algorithm --
>>>I've got more than enough mandatory work right now.
>>
>>The COBRA algorithm is designed for big bit sequences.  I don't know how well it
>>adapts to 64 bits.  I have only just downloaded the paper.
>>
>>This algorithm:
>>
>>typedef unsigned long ling Bitboard;
>>
>>Bitboard        breverse5(Bitboard n)
>>{
>>    int             i = 64;
>>    Bitboard        m = -1;
>>    while (i /= 2)
>>        m ^= m << i, n = n >> i & m | (n & m) << i;
>>    return n;
>>}
>>
>>iterates 6 times.  I suspect it will be faster than something table driven.
>>Perhaps also faster than 8 mask and shift operations.
>>
>>I'll be surprised if anything can beat it by 20%.
>
>Is that a challenge? :-)
>
>Grr...I have so many things I need to get done...and now you go and entice me to
>do more optimization!
>
>-Matt

Ah, the link Gerd had posted before gives this bit reverse algorithm:
unsigned int reverse(register unsigned int x)
{
	x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
	x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
	x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));
	x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));
	return((x >> 16) | (x << 16));

}

I just realized that it's basically an unrolled 32-bit version of the algorithm
you gave. It is -much- faster to bswap for the i=32, i=16, and i=8 cases. Two
bswaps, a mov, and an extra register will do those three iterations in-place in
2 cycles. The other iterations are more expensive (32-bit = 4 cycles, 6
instructions), but you can overlap it with the initial 3 iterations to reclaim
the mov & 1 cycle. Also, overlapping 32-bit portions of the 64-bit value allow
you to do them in parallel.

I have not tested this yet, but I have a version that takes 14 cycles.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.