Author: Roberto Waldteufel
Date: 17:54:34 11/03/98
Go up one level in this thread
On November 03, 1998 at 15:31:12, John Stoneham wrote: >On November 03, 1998 at 13:05:57, Robert Hyatt wrote: > >>On November 03, 1998 at 09:44:36, John Stoneham wrote: >> >>>On November 02, 1998 at 21:45:05, Robert Hyatt wrote: >>> >>>>a couple of tricks... >>>> >>>>1. you can use square&56 as the shift amount for the 90 degree rotated bit map, >>>>since that will get you exactly the right shift count for any rank... >>>> >>>>2. you may use a bishop shift value, but you can get away by always using 8 >>>>bits from the rotated diagonal bitmaps, if you simply set up the array so that >>>>the "unused bits" still produce the right answer. This avoids having a >>>>different mask for each diagonal length... >>> >>>The reason I pre-calculate even the simple shift and mask amounts, is that I >>>believe an operation such as ((bitboard & mask[sq]) >> shift[sq]) is probably >>>faster than, for example, ((bitboard >> (sq & 56)) & 0x00ff). I may be wrong, >>>and I should probably set up a simple test program to see what the actual >>>differnce is. >> >>It should be much slower. mask[sq] has *two* memory referencess. sq&56 has >>one. If you do that a lot, it might not matter as you should get a L1 cache >>hit (or at least a L2 hit). However, if you look at the above, on a non-alpha >>it gets even worse, because bitboard & mask is a 64 bit and operation. the >>bitboard>>is a 64 bit shift only... so on a 32 bit architecture the sq&56 >>ought to be signficantly faster... >> >>on the PC's you count memory references *not* instructions. Because the typical >>memory delay is 20+ clock cycles... > >OK. I ran a couple tests using a Pentium 133 (close to a worst-case senario for >64-bit instructions, I guess :) The first test used this form of the routine: >((bitboard & mask[sq]) >> shift[sq]), where mask[] was a 64-bit integer, and >shift[] was an 8-bit integer (actually a char type), and tested it's speed in >comparison to this form: ((bitboard >> (sq & 56)) & 0xff). > >The result was that the second form was, on average, about 2% faster, as per >your prediction. Then I ran another test against the second form, this time >using ((bitboard >> shift[sq]) & mask[sq]) where mask[] was an 8-bit integer >type. This was actually faster by about 0.5%. Obviously the change from a 64-bit >mask to an 8-bit mask was what made the difference. I don't know enough about >raw machine instructions etc., but I suppose that constants still have to be >loaded into a register at some point during the operation. In the tests, I used >"Release" mode optimizations for the compiler, just to be thorough. Have you tried a 32-bit shift? On the Intel machines running in 32-bit mode, you usually get the fastest code if you use 32-bit numbers all the time, or at least as far as this is possible. Strange as it may seem, using bytes or 16-bit integers is actually a lot slower in most cases where you have a choice. Also, I have head that the Pentiums give a worse penalty for arithmetic than logical operations (I have not tested this though), but that is good news for chess programming because the 64-bit bitboards tend to need logical rather than arithmetic operations to be performed on them. Best wishes, Roberto
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.