Author: Dann Corbit
Date: 04:00:30 01/22/03
Go up one level in this thread
On January 22, 2003 at 06:47:09, Matt Taylor wrote: >On January 22, 2003 at 06:44:08, Dann Corbit wrote: > >>On January 22, 2003 at 06:09:22, Dann Corbit wrote: >> >>>On January 22, 2003 at 05:33:20, Matt Taylor wrote: >>> >>>>On January 22, 2003 at 04:27:16, Dann Corbit wrote: >>>> >>>>>On January 22, 2003 at 03:29:05, Matt Taylor wrote: >>>>> >>>>>>On January 21, 2003 at 17:03:33, Dann Corbit wrote: >>>>>> >>>>>>>On January 21, 2003 at 15:48:57, Sander de Zoete wrote: >>>>>>> >>>>>>>>The following instruction I found for the new architecture of Intel Chips >>>>>>>> >>>>>>>>BSWAP—Byte Swap >>>>>>>> >>>>>>>>Description >>>>>>>>Reverses the byte order of a 32-bit (destination) register: >>>>>>>> >>>>>>>>Operation >>>>>>>>TEMP ¬ DEST >>>>>>>>DEST(7..0) ¬ TEMP(31..24) >>>>>>>>DEST(15..8) ¬ TEMP(23..16) >>>>>>>>DEST(23..16) ¬ TEMP(15..8) >>>>>>>>DEST(31..24) ¬ TEMP(7..0) >>>>>>>>Flags Affected >>>>>>>>None. >>>>>>>>Opcode Instruction Description >>>>>>>>0F C8+ rd BSWAP r32 Reverses the byte order of a 32-bit register. >>>>>>>> >>>>>>>>It is only valid for 486 architecture. >>>>>>>> >>>>>>>>If this instruction can be used, it should be very easy to reverse Reverse >>>>>>>>BitBoards into Forward Boards again. Saving a lot of updating in make and unmake >>>>>>>>move and also be much faster using the bitboards for generating attack >>>>>>>>board for evaluation purposes, incheck checks etc. >>>>>>>> >>>>>>>>The thing left to do is to reverse the bits per byte. >>>>>>>> >>>>>>>>I use the following trick, but it doesn't seem to work voor the value 9. Let me >>>>>>>>show you what I mean: (and ofcourse, can you help me?) >>>>>>>> >>>>>>>>In C code, it looks like this: >>>>>>>> >>>>>>>>typedef unsigned __int64 BITBOARD; >>>>>>>> >>>>>>>>void ReverseBitsPerByte(BITBOARD bitboard) >>>>>>>>{ >>>>>>>>// 1. Load the constant, k = 5555555555555555h >>>>>>>>BITBOARD k = 0x5555555555555555; >>>>>>>> >>>>>>>>// 2. x = [(x shl 1) and k] or [(x and k) shr 1] result is: EFCDAB8967452301 >>>>>>>>bitboard = ((bitboard<<1) & k) | ((bitboard & k)>>1 ); >>>>>>>>} >>>>>>>> >>>>>>>>Initial bitboard: >>>>>>>>11111110 >>>>>>>>11011100 >>>>>>>>10111010 >>>>>>>>10011000 >>>>>>>>01110110 >>>>>>>>01010100 >>>>>>>>00110010 >>>>>>>>00010000 >>>>>>>> >>>>>>>>Result of ReverseBitsPerByte(bitboard): >>>>>>>>01111111 >>>>>>>>00111011 >>>>>>>>01011101 >>>>>>>>00011000<--what the ^*&* goes wrong here? Should be 00011001. >>>>>>>>01101110 >>>>>>>>00101010 >>>>>>>>01001100 >>>>>>>>00001000 >>>>>>>> >>>>>>>>Thanks for any help or suggestions. >>>>>>> >>>>>><snip code> >>>>>> >>>>>>These are all basically the same algorithm. It's going to take a while to >>>>>>rearrange everything when you do it bit-by-bit. Sander has the right idea by >>>>>>first swapping bytes and then trying to reverse the bits within the bytes (8*2 >>>>>>ops + the swap as opposed to 64*2 ops). >>>>> >>>>>The algorithms don't perform identically. >>>>>They don't require 64 ops. >>>>>Once you get the "Sander" algorithm completed, compare the two (and also the >>>>>COBRA algorithm). >>>> >>>>Yes, breverse5 is a divide-and-conquer algorithm. The others all iterate 64 >>>>times. Assuming 2 ops/iteration was a gross underestimate as they use heavy >>>>masking and shifting. >>>> >>>>2 bswaps (and probably a mov, too) will reverse the bytes; afterward, the bits >>>>can be reversed as well. >>>> >>>>Not sure when or if I can have a look at COBRA and think about the algorithm -- >>>>I've got more than enough mandatory work right now. >>> >>>The COBRA algorithm is designed for big bit sequences. I don't know how well it >>>adapts to 64 bits. I have only just downloaded the paper. >>> >>>This algorithm: >>> >>>typedef unsigned long ling Bitboard; >>> >>>Bitboard breverse5(Bitboard n) >>>{ >>> int i = 64; >>> Bitboard m = -1; >>> while (i /= 2) >>> m ^= m << i, n = n >> i & m | (n & m) << i; >>> return n; >>>} >>> >>>iterates 6 times. I suspect it will be faster than something table driven. >>>Perhaps also faster than 8 mask and shift operations. >>> >>>I'll be surprised if anything can beat it by 20%. >> >>Here is the generated assembly language. It would look a lot prettier on a 64 >>bit chip, for sure. ><snip> > >Yes. Note: >0006a e8 fc ff ff ff call __allshl ;C:\tmp\brev.cpp > ^ Yuck. > >This is why parallel 32-bit w/bswap is much faster. > >If you want, I can time your algorithm. It will be quite nasty (because of >inefficient 64-bit shifts), I assure you. Each call is 8 clocks or more. This will be the rough alternative using the suggested algorithm: unsigned int reverse32(register unsigned int x) { register unsigned int y = 0x55555555; x = (((x >> 1) & y) | ((x & y) << 1)); y = 0x33333333; x = (((x >> 2) & y) | ((x & y) << 2)); y = 0x0f0f0f0f; x = (((x >> 4) & y) | ((x & y) << 4)); y = 0x00ff00ff; x = (((x >> 8) & y) | ((x & y) << 8)); return((x >> 16) | (x << 16)); } typedef union tag_faker { unsigned int a[2]; unsigned _int64 b; } faker; unsigned __int64 reverse64(faker x) { unsigned int a = reverse32(x.a[0]); x.a[0] = reverse32(x.a[1]); x.a[1] = a; return x.b; } When we get chips with real 64 bit CPUs, the speed will pick up a ton.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.