Author: Matt Taylor
Date: 03:47:09 01/22/03
Go up one level in this thread
On January 22, 2003 at 06:44:08, Dann Corbit wrote:
>On January 22, 2003 at 06:09:22, Dann Corbit wrote:
>
>>On January 22, 2003 at 05:33:20, Matt Taylor wrote:
>>
>>>On January 22, 2003 at 04:27:16, Dann Corbit wrote:
>>>
>>>>On January 22, 2003 at 03:29:05, Matt Taylor wrote:
>>>>
>>>>>On January 21, 2003 at 17:03:33, Dann Corbit wrote:
>>>>>
>>>>>>On January 21, 2003 at 15:48:57, Sander de Zoete wrote:
>>>>>>
>>>>>>>The following instruction I found for the new architecture of Intel Chips
>>>>>>>
>>>>>>>BSWAP—Byte Swap
>>>>>>>
>>>>>>>Description
>>>>>>>Reverses the byte order of a 32-bit (destination) register:
>>>>>>>
>>>>>>>Operation
>>>>>>>TEMP ¬ DEST
>>>>>>>DEST(7..0) ¬ TEMP(31..24)
>>>>>>>DEST(15..8) ¬ TEMP(23..16)
>>>>>>>DEST(23..16) ¬ TEMP(15..8)
>>>>>>>DEST(31..24) ¬ TEMP(7..0)
>>>>>>>Flags Affected
>>>>>>>None.
>>>>>>>Opcode Instruction Description
>>>>>>>0F C8+ rd BSWAP r32 Reverses the byte order of a 32-bit register.
>>>>>>>
>>>>>>>It is only valid for 486 architecture.
>>>>>>>
>>>>>>>If this instruction can be used, it should be very easy to reverse Reverse
>>>>>>>BitBoards into Forward Boards again. Saving a lot of updating in make and unmake
>>>>>>>move and also be much faster using the bitboards for generating attack
>>>>>>>board for evaluation purposes, incheck checks etc.
>>>>>>>
>>>>>>>The thing left to do is to reverse the bits per byte.
>>>>>>>
>>>>>>>I use the following trick, but it doesn't seem to work voor the value 9. Let me
>>>>>>>show you what I mean: (and ofcourse, can you help me?)
>>>>>>>
>>>>>>>In C code, it looks like this:
>>>>>>>
>>>>>>>typedef unsigned __int64 BITBOARD;
>>>>>>>
>>>>>>>void ReverseBitsPerByte(BITBOARD bitboard)
>>>>>>>{
>>>>>>>//  1.  Load the constant, k = 5555555555555555h
>>>>>>>BITBOARD k = 0x5555555555555555;
>>>>>>>
>>>>>>>//  2.  x = [(x shl 1) and k] or [(x and k) shr 1] result is: EFCDAB8967452301
>>>>>>>bitboard = ((bitboard<<1) & k) | ((bitboard & k)>>1 );
>>>>>>>}
>>>>>>>
>>>>>>>Initial bitboard:
>>>>>>>11111110
>>>>>>>11011100
>>>>>>>10111010
>>>>>>>10011000
>>>>>>>01110110
>>>>>>>01010100
>>>>>>>00110010
>>>>>>>00010000
>>>>>>>
>>>>>>>Result of ReverseBitsPerByte(bitboard):
>>>>>>>01111111
>>>>>>>00111011
>>>>>>>01011101
>>>>>>>00011000<--what the ^*&* goes wrong here? Should be 00011001.
>>>>>>>01101110
>>>>>>>00101010
>>>>>>>01001100
>>>>>>>00001000
>>>>>>>
>>>>>>>Thanks for any help or suggestions.
>>>>>>
>>>>><snip code>
>>>>>
>>>>>These are all basically the same algorithm. It's going to take a while to
>>>>>rearrange everything when you do it bit-by-bit. Sander has the right idea by
>>>>>first swapping bytes and then trying to reverse the bits within the bytes (8*2
>>>>>ops + the swap as opposed to 64*2 ops).
>>>>
>>>>The algorithms don't perform identically.
>>>>They don't require 64 ops.
>>>>Once you get the "Sander" algorithm completed, compare the two (and also the
>>>>COBRA algorithm).
>>>
>>>Yes, breverse5 is a divide-and-conquer algorithm. The others all iterate 64
>>>times. Assuming 2 ops/iteration was a gross underestimate as they use heavy
>>>masking and shifting.
>>>
>>>2 bswaps (and probably a mov, too) will reverse the bytes; afterward, the bits
>>>can be reversed as well.
>>>
>>>Not sure when or if I can have a look at COBRA and think about the algorithm --
>>>I've got more than enough mandatory work right now.
>>
>>The COBRA algorithm is designed for big bit sequences.  I don't know how well it
>>adapts to 64 bits.  I have only just downloaded the paper.
>>
>>This algorithm:
>>
>>typedef unsigned long ling Bitboard;
>>
>>Bitboard        breverse5(Bitboard n)
>>{
>>    int             i = 64;
>>    Bitboard        m = -1;
>>    while (i /= 2)
>>        m ^= m << i, n = n >> i & m | (n & m) << i;
>>    return n;
>>}
>>
>>iterates 6 times.  I suspect it will be faster than something table driven.
>>Perhaps also faster than 8 mask and shift operations.
>>
>>I'll be surprised if anything can beat it by 20%.
>
>Here is the generated assembly language.  It would look a lot prettier on a 64
>bit chip, for sure.
<snip>
Yes. Note:
0006a e8 fc ff ff ff   call __allshl                          ;C:\tmp\brev.cpp
                       ^ Yuck.
This is why parallel 32-bit w/bswap is much faster.
If you want, I can time your algorithm. It will be quite nasty (because of
inefficient 64-bit shifts), I assure you. Each call is 8 clocks or more.
-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.