Author: Matt Taylor
Date: 03:44:45 01/22/03
Go up one level in this thread
On January 22, 2003 at 06:35:19, Matt Taylor wrote: >On January 22, 2003 at 06:09:22, Dann Corbit wrote: > >>On January 22, 2003 at 05:33:20, Matt Taylor wrote: >> >>>On January 22, 2003 at 04:27:16, Dann Corbit wrote: >>> >>>>On January 22, 2003 at 03:29:05, Matt Taylor wrote: >>>> >>>>>On January 21, 2003 at 17:03:33, Dann Corbit wrote: >>>>> >>>>>>On January 21, 2003 at 15:48:57, Sander de Zoete wrote: >>>>>> >>>>>>>The following instruction I found for the new architecture of Intel Chips >>>>>>> >>>>>>>BSWAP—Byte Swap >>>>>>> >>>>>>>Description >>>>>>>Reverses the byte order of a 32-bit (destination) register: >>>>>>> >>>>>>>Operation >>>>>>>TEMP ¬ DEST >>>>>>>DEST(7..0) ¬ TEMP(31..24) >>>>>>>DEST(15..8) ¬ TEMP(23..16) >>>>>>>DEST(23..16) ¬ TEMP(15..8) >>>>>>>DEST(31..24) ¬ TEMP(7..0) >>>>>>>Flags Affected >>>>>>>None. >>>>>>>Opcode Instruction Description >>>>>>>0F C8+ rd BSWAP r32 Reverses the byte order of a 32-bit register. >>>>>>> >>>>>>>It is only valid for 486 architecture. >>>>>>> >>>>>>>If this instruction can be used, it should be very easy to reverse Reverse >>>>>>>BitBoards into Forward Boards again. Saving a lot of updating in make and unmake >>>>>>>move and also be much faster using the bitboards for generating attack >>>>>>>board for evaluation purposes, incheck checks etc. >>>>>>> >>>>>>>The thing left to do is to reverse the bits per byte. >>>>>>> >>>>>>>I use the following trick, but it doesn't seem to work voor the value 9. Let me >>>>>>>show you what I mean: (and ofcourse, can you help me?) >>>>>>> >>>>>>>In C code, it looks like this: >>>>>>> >>>>>>>typedef unsigned __int64 BITBOARD; >>>>>>> >>>>>>>void ReverseBitsPerByte(BITBOARD bitboard) >>>>>>>{ >>>>>>>// 1. Load the constant, k = 5555555555555555h >>>>>>>BITBOARD k = 0x5555555555555555; >>>>>>> >>>>>>>// 2. x = [(x shl 1) and k] or [(x and k) shr 1] result is: EFCDAB8967452301 >>>>>>>bitboard = ((bitboard<<1) & k) | ((bitboard & k)>>1 ); >>>>>>>} >>>>>>> >>>>>>>Initial bitboard: >>>>>>>11111110 >>>>>>>11011100 >>>>>>>10111010 >>>>>>>10011000 >>>>>>>01110110 >>>>>>>01010100 >>>>>>>00110010 >>>>>>>00010000 >>>>>>> >>>>>>>Result of ReverseBitsPerByte(bitboard): >>>>>>>01111111 >>>>>>>00111011 >>>>>>>01011101 >>>>>>>00011000<--what the ^*&* goes wrong here? Should be 00011001. >>>>>>>01101110 >>>>>>>00101010 >>>>>>>01001100 >>>>>>>00001000 >>>>>>> >>>>>>>Thanks for any help or suggestions. >>>>>> >>>>><snip code> >>>>> >>>>>These are all basically the same algorithm. It's going to take a while to >>>>>rearrange everything when you do it bit-by-bit. Sander has the right idea by >>>>>first swapping bytes and then trying to reverse the bits within the bytes (8*2 >>>>>ops + the swap as opposed to 64*2 ops). >>>> >>>>The algorithms don't perform identically. >>>>They don't require 64 ops. >>>>Once you get the "Sander" algorithm completed, compare the two (and also the >>>>COBRA algorithm). >>> >>>Yes, breverse5 is a divide-and-conquer algorithm. The others all iterate 64 >>>times. Assuming 2 ops/iteration was a gross underestimate as they use heavy >>>masking and shifting. >>> >>>2 bswaps (and probably a mov, too) will reverse the bytes; afterward, the bits >>>can be reversed as well. >>> >>>Not sure when or if I can have a look at COBRA and think about the algorithm -- >>>I've got more than enough mandatory work right now. >> >>The COBRA algorithm is designed for big bit sequences. I don't know how well it >>adapts to 64 bits. I have only just downloaded the paper. >> >>This algorithm: >> >>typedef unsigned long ling Bitboard; >> >>Bitboard breverse5(Bitboard n) >>{ >> int i = 64; >> Bitboard m = -1; >> while (i /= 2) >> m ^= m << i, n = n >> i & m | (n & m) << i; >> return n; >>} >> >>iterates 6 times. I suspect it will be faster than something table driven. >>Perhaps also faster than 8 mask and shift operations. >> >>I'll be surprised if anything can beat it by 20%. > >Is that a challenge? :-) > >Grr...I have so many things I need to get done...and now you go and entice me to >do more optimization! > >-Matt Ah, the link Gerd had posted before gives this bit reverse algorithm: unsigned int reverse(register unsigned int x) { x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); return((x >> 16) | (x << 16)); } I just realized that it's basically an unrolled 32-bit version of the algorithm you gave. It is -much- faster to bswap for the i=32, i=16, and i=8 cases. Two bswaps, a mov, and an extra register will do those three iterations in-place in 2 cycles. The other iterations are more expensive (32-bit = 4 cycles, 6 instructions), but you can overlap it with the initial 3 iterations to reclaim the mov & 1 cycle. Also, overlapping 32-bit portions of the 64-bit value allow you to do them in parallel. I have not tested this yet, but I have a version that takes 14 cycles. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.