Author: John Stoneham
Date: 08:20:16 11/04/98
Go up one level in this thread
Roberto's suggestion of trying 32-bit values led me to do a more complete test, comparing these approaches. I used 12 different forms of ((bitboard >> shift) & mask), in each the bitboard was a 64-bit integer. In the first set, the shift amount and the mask were each stored in an array. The mask array ranged from 64-bit integers, to 32-bit, 16-bit and 8-bit. The next set replaced the shift array with the operation (square&56), while the mask array was replaced with a constant, which ranged from 64-bit down through 8-bits. The last set kept the mask constant, but added back the shift array. These were done on a Pentium 133. The results surprised me. The fastest form used the array shift amount, and a 16-bit constant for the mask. In each set, the 16-bit masks resulted in the fastest times. This is obviously CPU/compiler dependent. The difference between the slowest and fastest times was about 2%. I'm sure I didn't use the most definitive form of testing, since it only consisted of timed loops, reporting the number of iterations for each form during that specified time. To get very accurate, I probably need to look at the assembly instructions for each form as generated by the compiler and then add up the clock cycles for each. But I think that is getting a little over my head... On November 03, 1998 at 20:54:34, Roberto Waldteufel wrote: > >On November 03, 1998 at 15:31:12, John Stoneham wrote: > >>On November 03, 1998 at 13:05:57, Robert Hyatt wrote: >> >>>On November 03, 1998 at 09:44:36, John Stoneham wrote: >>> >>>>On November 02, 1998 at 21:45:05, Robert Hyatt wrote: >>>> >>>>>a couple of tricks... >>>>> >>>>>1. you can use square&56 as the shift amount for the 90 degree rotated bit map, >>>>>since that will get you exactly the right shift count for any rank... >>>>> >>>>>2. you may use a bishop shift value, but you can get away by always using 8 >>>>>bits from the rotated diagonal bitmaps, if you simply set up the array so that >>>>>the "unused bits" still produce the right answer. This avoids having a >>>>>different mask for each diagonal length... >>>> >>>>The reason I pre-calculate even the simple shift and mask amounts, is that I >>>>believe an operation such as ((bitboard & mask[sq]) >> shift[sq]) is probably >>>>faster than, for example, ((bitboard >> (sq & 56)) & 0x00ff). I may be wrong, >>>>and I should probably set up a simple test program to see what the actual >>>>differnce is. >>> >>>It should be much slower. mask[sq] has *two* memory referencess. sq&56 has >>>one. If you do that a lot, it might not matter as you should get a L1 cache >>>hit (or at least a L2 hit). However, if you look at the above, on a non-alpha >>>it gets even worse, because bitboard & mask is a 64 bit and operation. the >>>bitboard>>is a 64 bit shift only... so on a 32 bit architecture the sq&56 >>>ought to be signficantly faster... >>> >>>on the PC's you count memory references *not* instructions. Because the typical >>>memory delay is 20+ clock cycles... >> >>OK. I ran a couple tests using a Pentium 133 (close to a worst-case senario for >>64-bit instructions, I guess :) The first test used this form of the routine: >>((bitboard & mask[sq]) >> shift[sq]), where mask[] was a 64-bit integer, and >>shift[] was an 8-bit integer (actually a char type), and tested it's speed in >>comparison to this form: ((bitboard >> (sq & 56)) & 0xff). >> >>The result was that the second form was, on average, about 2% faster, as per >>your prediction. Then I ran another test against the second form, this time >>using ((bitboard >> shift[sq]) & mask[sq]) where mask[] was an 8-bit integer >>type. This was actually faster by about 0.5%. Obviously the change from a 64-bit >>mask to an 8-bit mask was what made the difference. I don't know enough about >>raw machine instructions etc., but I suppose that constants still have to be >>loaded into a register at some point during the operation. In the tests, I used >>"Release" mode optimizations for the compiler, just to be thorough. > >Have you tried a 32-bit shift? On the Intel machines running in 32-bit mode, you >usually get the fastest code if you use 32-bit numbers all the time, or at least >as far as this is possible. Strange as it may seem, using bytes or 16-bit >integers is actually a lot slower in most cases where you have a choice. Also, I >have head that the Pentiums give a worse penalty for arithmetic than logical >operations (I have not tested this though), but that is good news for chess >programming because the 64-bit bitboards tend to need logical rather than >arithmetic operations to be performed on them. > >Best wishes, >Roberto
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.