Author: Matt Taylor
Date: 03:47:09 01/22/03
Go up one level in this thread
On January 22, 2003 at 06:44:08, Dann Corbit wrote: >On January 22, 2003 at 06:09:22, Dann Corbit wrote: > >>On January 22, 2003 at 05:33:20, Matt Taylor wrote: >> >>>On January 22, 2003 at 04:27:16, Dann Corbit wrote: >>> >>>>On January 22, 2003 at 03:29:05, Matt Taylor wrote: >>>> >>>>>On January 21, 2003 at 17:03:33, Dann Corbit wrote: >>>>> >>>>>>On January 21, 2003 at 15:48:57, Sander de Zoete wrote: >>>>>> >>>>>>>The following instruction I found for the new architecture of Intel Chips >>>>>>> >>>>>>>BSWAP—Byte Swap >>>>>>> >>>>>>>Description >>>>>>>Reverses the byte order of a 32-bit (destination) register: >>>>>>> >>>>>>>Operation >>>>>>>TEMP ¬ DEST >>>>>>>DEST(7..0) ¬ TEMP(31..24) >>>>>>>DEST(15..8) ¬ TEMP(23..16) >>>>>>>DEST(23..16) ¬ TEMP(15..8) >>>>>>>DEST(31..24) ¬ TEMP(7..0) >>>>>>>Flags Affected >>>>>>>None. >>>>>>>Opcode Instruction Description >>>>>>>0F C8+ rd BSWAP r32 Reverses the byte order of a 32-bit register. >>>>>>> >>>>>>>It is only valid for 486 architecture. >>>>>>> >>>>>>>If this instruction can be used, it should be very easy to reverse Reverse >>>>>>>BitBoards into Forward Boards again. Saving a lot of updating in make and unmake >>>>>>>move and also be much faster using the bitboards for generating attack >>>>>>>board for evaluation purposes, incheck checks etc. >>>>>>> >>>>>>>The thing left to do is to reverse the bits per byte. >>>>>>> >>>>>>>I use the following trick, but it doesn't seem to work voor the value 9. Let me >>>>>>>show you what I mean: (and ofcourse, can you help me?) >>>>>>> >>>>>>>In C code, it looks like this: >>>>>>> >>>>>>>typedef unsigned __int64 BITBOARD; >>>>>>> >>>>>>>void ReverseBitsPerByte(BITBOARD bitboard) >>>>>>>{ >>>>>>>// 1. Load the constant, k = 5555555555555555h >>>>>>>BITBOARD k = 0x5555555555555555; >>>>>>> >>>>>>>// 2. x = [(x shl 1) and k] or [(x and k) shr 1] result is: EFCDAB8967452301 >>>>>>>bitboard = ((bitboard<<1) & k) | ((bitboard & k)>>1 ); >>>>>>>} >>>>>>> >>>>>>>Initial bitboard: >>>>>>>11111110 >>>>>>>11011100 >>>>>>>10111010 >>>>>>>10011000 >>>>>>>01110110 >>>>>>>01010100 >>>>>>>00110010 >>>>>>>00010000 >>>>>>> >>>>>>>Result of ReverseBitsPerByte(bitboard): >>>>>>>01111111 >>>>>>>00111011 >>>>>>>01011101 >>>>>>>00011000<--what the ^*&* goes wrong here? Should be 00011001. >>>>>>>01101110 >>>>>>>00101010 >>>>>>>01001100 >>>>>>>00001000 >>>>>>> >>>>>>>Thanks for any help or suggestions. >>>>>> >>>>><snip code> >>>>> >>>>>These are all basically the same algorithm. It's going to take a while to >>>>>rearrange everything when you do it bit-by-bit. Sander has the right idea by >>>>>first swapping bytes and then trying to reverse the bits within the bytes (8*2 >>>>>ops + the swap as opposed to 64*2 ops). >>>> >>>>The algorithms don't perform identically. >>>>They don't require 64 ops. >>>>Once you get the "Sander" algorithm completed, compare the two (and also the >>>>COBRA algorithm). >>> >>>Yes, breverse5 is a divide-and-conquer algorithm. The others all iterate 64 >>>times. Assuming 2 ops/iteration was a gross underestimate as they use heavy >>>masking and shifting. >>> >>>2 bswaps (and probably a mov, too) will reverse the bytes; afterward, the bits >>>can be reversed as well. >>> >>>Not sure when or if I can have a look at COBRA and think about the algorithm -- >>>I've got more than enough mandatory work right now. >> >>The COBRA algorithm is designed for big bit sequences. I don't know how well it >>adapts to 64 bits. I have only just downloaded the paper. >> >>This algorithm: >> >>typedef unsigned long ling Bitboard; >> >>Bitboard breverse5(Bitboard n) >>{ >> int i = 64; >> Bitboard m = -1; >> while (i /= 2) >> m ^= m << i, n = n >> i & m | (n & m) << i; >> return n; >>} >> >>iterates 6 times. I suspect it will be faster than something table driven. >>Perhaps also faster than 8 mask and shift operations. >> >>I'll be surprised if anything can beat it by 20%. > >Here is the generated assembly language. It would look a lot prettier on a 64 >bit chip, for sure. <snip> Yes. Note: 0006a e8 fc ff ff ff call __allshl ;C:\tmp\brev.cpp ^ Yuck. This is why parallel 32-bit w/bswap is much faster. If you want, I can time your algorithm. It will be quite nasty (because of inefficient 64-bit shifts), I assure you. Each call is 8 clocks or more. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.