Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to reverse a Reverse BitBoard!

Author: Dann Corbit

Date: 04:00:30 01/22/03

Go up one level in this thread


On January 22, 2003 at 06:47:09, Matt Taylor wrote:

>On January 22, 2003 at 06:44:08, Dann Corbit wrote:
>
>>On January 22, 2003 at 06:09:22, Dann Corbit wrote:
>>
>>>On January 22, 2003 at 05:33:20, Matt Taylor wrote:
>>>
>>>>On January 22, 2003 at 04:27:16, Dann Corbit wrote:
>>>>
>>>>>On January 22, 2003 at 03:29:05, Matt Taylor wrote:
>>>>>
>>>>>>On January 21, 2003 at 17:03:33, Dann Corbit wrote:
>>>>>>
>>>>>>>On January 21, 2003 at 15:48:57, Sander de Zoete wrote:
>>>>>>>
>>>>>>>>The following instruction I found for the new architecture of Intel Chips
>>>>>>>>
>>>>>>>>BSWAP—Byte Swap
>>>>>>>>
>>>>>>>>Description
>>>>>>>>Reverses the byte order of a 32-bit (destination) register:
>>>>>>>>
>>>>>>>>Operation
>>>>>>>>TEMP ¬ DEST
>>>>>>>>DEST(7..0) ¬ TEMP(31..24)
>>>>>>>>DEST(15..8) ¬ TEMP(23..16)
>>>>>>>>DEST(23..16) ¬ TEMP(15..8)
>>>>>>>>DEST(31..24) ¬ TEMP(7..0)
>>>>>>>>Flags Affected
>>>>>>>>None.
>>>>>>>>Opcode Instruction Description
>>>>>>>>0F C8+ rd BSWAP r32 Reverses the byte order of a 32-bit register.
>>>>>>>>
>>>>>>>>It is only valid for 486 architecture.
>>>>>>>>
>>>>>>>>If this instruction can be used, it should be very easy to reverse Reverse
>>>>>>>>BitBoards into Forward Boards again. Saving a lot of updating in make and unmake
>>>>>>>>move and also be much faster using the bitboards for generating attack
>>>>>>>>board for evaluation purposes, incheck checks etc.
>>>>>>>>
>>>>>>>>The thing left to do is to reverse the bits per byte.
>>>>>>>>
>>>>>>>>I use the following trick, but it doesn't seem to work voor the value 9. Let me
>>>>>>>>show you what I mean: (and ofcourse, can you help me?)
>>>>>>>>
>>>>>>>>In C code, it looks like this:
>>>>>>>>
>>>>>>>>typedef unsigned __int64 BITBOARD;
>>>>>>>>
>>>>>>>>void ReverseBitsPerByte(BITBOARD bitboard)
>>>>>>>>{
>>>>>>>>//  1.  Load the constant, k = 5555555555555555h
>>>>>>>>BITBOARD k = 0x5555555555555555;
>>>>>>>>
>>>>>>>>//  2.  x = [(x shl 1) and k] or [(x and k) shr 1] result is: EFCDAB8967452301
>>>>>>>>bitboard = ((bitboard<<1) & k) | ((bitboard & k)>>1 );
>>>>>>>>}
>>>>>>>>
>>>>>>>>Initial bitboard:
>>>>>>>>11111110
>>>>>>>>11011100
>>>>>>>>10111010
>>>>>>>>10011000
>>>>>>>>01110110
>>>>>>>>01010100
>>>>>>>>00110010
>>>>>>>>00010000
>>>>>>>>
>>>>>>>>Result of ReverseBitsPerByte(bitboard):
>>>>>>>>01111111
>>>>>>>>00111011
>>>>>>>>01011101
>>>>>>>>00011000<--what the ^*&* goes wrong here? Should be 00011001.
>>>>>>>>01101110
>>>>>>>>00101010
>>>>>>>>01001100
>>>>>>>>00001000
>>>>>>>>
>>>>>>>>Thanks for any help or suggestions.
>>>>>>>
>>>>>><snip code>
>>>>>>
>>>>>>These are all basically the same algorithm. It's going to take a while to
>>>>>>rearrange everything when you do it bit-by-bit. Sander has the right idea by
>>>>>>first swapping bytes and then trying to reverse the bits within the bytes (8*2
>>>>>>ops + the swap as opposed to 64*2 ops).
>>>>>
>>>>>The algorithms don't perform identically.
>>>>>They don't require 64 ops.
>>>>>Once you get the "Sander" algorithm completed, compare the two (and also the
>>>>>COBRA algorithm).
>>>>
>>>>Yes, breverse5 is a divide-and-conquer algorithm. The others all iterate 64
>>>>times. Assuming 2 ops/iteration was a gross underestimate as they use heavy
>>>>masking and shifting.
>>>>
>>>>2 bswaps (and probably a mov, too) will reverse the bytes; afterward, the bits
>>>>can be reversed as well.
>>>>
>>>>Not sure when or if I can have a look at COBRA and think about the algorithm --
>>>>I've got more than enough mandatory work right now.
>>>
>>>The COBRA algorithm is designed for big bit sequences.  I don't know how well it
>>>adapts to 64 bits.  I have only just downloaded the paper.
>>>
>>>This algorithm:
>>>
>>>typedef unsigned long ling Bitboard;
>>>
>>>Bitboard        breverse5(Bitboard n)
>>>{
>>>    int             i = 64;
>>>    Bitboard        m = -1;
>>>    while (i /= 2)
>>>        m ^= m << i, n = n >> i & m | (n & m) << i;
>>>    return n;
>>>}
>>>
>>>iterates 6 times.  I suspect it will be faster than something table driven.
>>>Perhaps also faster than 8 mask and shift operations.
>>>
>>>I'll be surprised if anything can beat it by 20%.
>>
>>Here is the generated assembly language.  It would look a lot prettier on a 64
>>bit chip, for sure.
><snip>
>
>Yes. Note:
>0006a e8 fc ff ff ff   call __allshl                          ;C:\tmp\brev.cpp
>                       ^ Yuck.
>
>This is why parallel 32-bit w/bswap is much faster.
>
>If you want, I can time your algorithm. It will be quite nasty (because of
>inefficient 64-bit shifts), I assure you. Each call is 8 clocks or more.

This will be the rough alternative using the suggested algorithm:

unsigned int
reverse32(register unsigned int x)
{
        register unsigned int y = 0x55555555;
        x = (((x >> 1) & y) | ((x & y) << 1));
        y = 0x33333333;
        x = (((x >> 2) & y) | ((x & y) << 2));
        y = 0x0f0f0f0f;
        x = (((x >> 4) & y) | ((x & y) << 4));
        y = 0x00ff00ff;
        x = (((x >> 8) & y) | ((x & y) << 8));
        return((x >> 16) | (x << 16));
}

typedef union tag_faker {
   unsigned int a[2];
   unsigned _int64 b;
} faker;

unsigned __int64
reverse64(faker x)
{
   unsigned int a = reverse32(x.a[0]);
   x.a[0] = reverse32(x.a[1]);
   x.a[1] = a;
   return x.b;
}

When we get chips with real 64 bit CPUs, the speed will pick up a ton.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.