Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Extracting information from rotated Bitboards

Author: John Stoneham
Date: 08:20:16 11/04/98
Roberto's suggestion of trying 32-bit values led me to do a more complete test,
comparing these approaches. I used 12 different forms of ((bitboard >> shift) &
mask), in each the bitboard was a 64-bit integer. In the first set, the shift
amount and the mask were each stored in an array. The mask array ranged from
64-bit integers, to 32-bit, 16-bit and 8-bit. The next set replaced the shift
array with the operation (square&56), while the mask array was replaced with a
constant, which ranged from 64-bit down through 8-bits. The last set kept the
mask constant, but added back the shift array. These were done on a Pentium 133.
The results surprised me.

The fastest form used the array shift amount, and a 16-bit constant for the
mask. In each set, the 16-bit masks resulted in the fastest times. This is
obviously CPU/compiler dependent. The difference between the slowest and fastest
times was about 2%. I'm sure I didn't use the most definitive form of testing,
since it only consisted of timed loops, reporting the number of iterations for
each form during that specified time. To get very accurate, I probably need to
look at the assembly instructions for each form as generated by the compiler and
then add up the clock cycles for each. But I think that is getting a little over
my head...

On November 03, 1998 at 20:54:34, Roberto Waldteufel wrote:
>
>On November 03, 1998 at 15:31:12, John Stoneham wrote:
>
>>On November 03, 1998 at 13:05:57, Robert Hyatt wrote:
>>
>>>On November 03, 1998 at 09:44:36, John Stoneham wrote:
>>>
>>>>On November 02, 1998 at 21:45:05, Robert Hyatt wrote:
>>>>
>>>>>a couple of tricks...
>>>>>
>>>>>1.  you can use square&56 as the shift amount for the 90 degree rotated bit map,
>>>>>since that will get you exactly the right shift count for any rank...
>>>>>
>>>>>2.  you may use a bishop shift value, but you can get away by always using 8
>>>>>bits from the rotated diagonal bitmaps, if you simply set up the array so that
>>>>>the "unused bits" still produce the right answer.  This avoids having a
>>>>>different mask for each diagonal length...
>>>>
>>>>The reason I pre-calculate even the simple shift and mask amounts, is that I
>>>>believe an operation such as ((bitboard & mask[sq]) >> shift[sq]) is probably
>>>>faster than, for example, ((bitboard >> (sq & 56)) & 0x00ff). I may be wrong,
>>>>and I should probably set up a simple test program to see what the actual
>>>>differnce is.
>>>
>>>It should be much slower.  mask[sq] has *two* memory referencess.  sq&56 has
>>>one.  If you do that a lot, it might not matter as you should get a L1 cache
>>>hit (or at least a L2 hit).  However, if you look at the above, on a non-alpha
>>>it gets even worse, because bitboard & mask is a 64 bit and operation.  the
>>>bitboard>>is a 64 bit shift only...  so on a 32 bit architecture the sq&56
>>>ought to be signficantly faster...
>>>
>>>on the PC's you count memory references *not* instructions.  Because the typical
>>>memory delay is 20+ clock cycles...
>>
>>OK. I ran a couple tests using a Pentium 133 (close to a worst-case senario for
>>64-bit instructions, I guess :) The first test used this form of the routine:
>>((bitboard & mask[sq]) >> shift[sq]), where mask[] was a 64-bit integer, and
>>shift[] was an 8-bit integer (actually a char type), and tested it's speed in
>>comparison to this form: ((bitboard >> (sq & 56)) & 0xff).
>>
>>The result was that the second form was, on average, about 2% faster, as per
>>your prediction. Then I ran another test against the second form, this time
>>using ((bitboard >> shift[sq]) & mask[sq]) where mask[] was an 8-bit integer
>>type. This was actually faster by about 0.5%. Obviously the change from a 64-bit
>>mask to an 8-bit mask was what made the difference. I don't know enough about
>>raw machine instructions etc., but I suppose that constants still have to be
>>loaded into a register at some point during the operation. In the tests, I used
>>"Release" mode optimizations for the compiler, just to be thorough.
>
>Have you tried a 32-bit shift? On the Intel machines running in 32-bit mode, you
>usually get the fastest code if you use 32-bit numbers all the time, or at least
>as far as this is possible. Strange as it may seem, using bytes or 16-bit
>integers is actually a lot slower in most cases where you have a choice. Also, I
>have head that the Pentiums give a worse penalty for arithmetic than logical
>operations (I have not tested this though), but that is good news for chess
>programming because the 64-bit bitboards tend to need logical rather than
>arithmetic operations to be performed on them.
>
>Best wishes,
>Roberto
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.