Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Reversed vs. Rotated Bitboards

Author: Sune Fischer

Date: 17:00:50 02/01/02

Go up one level in this thread


On January 31, 2002 at 21:53:16, Wylie Garvin wrote:

>On January 28, 2002 at 08:36:15, Sune Fischer wrote:
>
>>Hi,
>>
>>I'm in the process of rewriting my entire movegen, now I'm going to need
>>bitboards I've decided, but it seems there are different approaches.
>>
>>I found this link (it was actually dead, but google had it cached so I put it up
>>for a short while)
>>http://www.fys.ku.dk/~fischer/Temp/New%20Technology.htm
>>
>>It is about reversed bitboards, the author claims it is faster than rotated:
>>
>>"When taking account of memory latencies, calculating the piece attacks using
>>the forward and reverse bitboards can be done significantly faster due to total
>>independence on lookup tables and complex calculations. There are a few slight
>>snags with the diagonal calculations, but they are minor."
>>
>>
>>His description ("significantly faster") entrigues me (hehe) but I can't find
>>anything else on reversed bitboards and I've never heard of them before.
>>
>>What's the catch, is there something he is not telling?
>>
>>-S.
>
>Hi Sune,
>
>   Here's the thing--on machines which DON'T have bit-scan instructions, you
>need some technique which can tell you where the "ray" of reachable squares for
>a sliding piece ends.
>
>   The "rotated bitboard" approach lets you do a little bit of shifting and
>masking to come up with an "occupancy number" for a rank/file/diagonal (i.e. of
>which there are 4 per square), which then indexes into a LARGE table (I think
>it's about 512KB) to return the desired bitboard with the reachable squares
>along that rank/file/diagonal.  512kb is punishing on your cache though, so you
>can merge together adjacent results and do an additional masking afterwards
>(i.e. if looking up along a file, have the files for all 8 ranks combined into
>one result board.  if looking up along diagonals, combine by either rank or
>file).  This reduces the occupancy tables to about 32kb.
>
>   The "reversed bitboard" approach takes advantage of the way arithmetic carry
>works to locate the end of the ray for you.  You take all pieces, mask with the
>appropriate *ray* (i.e. of which there are 8 per square) and do the b^(b-1)
>trick or some variant.  The reason you need to maintain a reversed bitboard is
>simply that arithmetic carry only goes in one direction!  The main problem with
>the reversed-bitboard approach seems to be that it *can* be costly, if you are
>not careful, to combine the forward and reversed bitboards.  In particular, you
>have managed to avoid using a table by using this technique, so now you need
>some efficient way to reverse the generated mask.  Note you can do this BEFORE
>anding the mask with the appropriate ray again...i.e. you can do it while it has
>the form 0000..11111..0000, i.e. A 0's, B 1's and C 0's.  In this case the
>reversed mask has C 0's, B 1's and A 0's, and in theory you could do the
>reversal with a single left or right shift, if you knew how far to shift it.
>
>   When I was thinking of coding this up in assembly for the Pentium, my
>strategy was:
>    (1) Use BSR/BSF (the bit-scan instructions on Intel) to discover A and C.
>    (2) shift the result left by C (so it now has the form 1111..0000000).
>    (3) shift the result right by A.
>
>   But doing this efficiently was still pretty complicated, and finally I clued
>in to the obvious better way of doing it (on Intel hardware anyway).
>
>   So I offer here my preferred technique.  It only works on machines with
>bit-scan instructions.  If you don't care about other platforms besides the IA32
>platform (or x86 platform as its called), then this is the one you should use:
>    - Take all pieces and mask with appropriate ray.
>    - Use the BSR or BSF instructions to find the first/last 1 in this bitboard
>(which one you use depends on the "direction" of the ray.  I number my squares
>so that H1 is 0, G1 is 1, ..., A1 is 7, H2 is 8, ... H8 is 56, ... A8 is 63.  In
>my case, the W, NW, N and NE rays are "forward", so I use BSRs to find the end,
>and the E, SE, S and SW rays are "backward" so I use BSFs to find the end).
>    - if you are generating captures, set the bit with the number found in your
>result bitboard.  If you are generating non-captures, you can either walk the
>squares from your source square to the number found, recording each move, or you
>can repeat the above steps for two opposing rays, and then for each one get the
>ray mask pointing in the *opposite* direction and originating at the number
>found, i.e. the first occupied square.  Then *AND* the two such masks together
>and OR that into your result.
>
>  In either case, remember to mask the result with empty squares (non-captures)
>or enemy pieces (captures) or you can get the wrong results.  Also, note that
>the result of BSR/BSF is generally not predictable if you use it on an empty
>bitboard (the instruction sets the zero flag if this happens though, so you can
>detect it).  For this reason, before starting, I take the mask of all pieces and
>OR in the borders of the board, and all my ray masks which would otherwise be
>empty contain a single bit for the source square instead.  This way, such "empty
>rays" that stick off the board cause the source square to be added to the result
>mask (for captures).  The final step of masking with enemy pieces removes that
>square.  The final masking step also removes the source square from the
>reachable squares along the diagonal which are obtained by ANDing the two
>opposing rays as I tried to describe above.
>
>  Anyway, I hope that wasn't too confused.

Heh, I'm very confused.

But I did figure out a method very similar to this.
To reverse the reversed bitboard, one can do a LastBit() call and get the last
attacked square. Now this ray can be looked up in a permanent table with
index: [sq][63-LastBit()] (because we already know the other end of the ray).
This can now be OR'ed to the previous forward attack board. I agree the table
wouldn't need to be larger than 4096 Bytes, but finding the right index will be
a bit more tricky. As you say it is probably a good idea to mask the 'sq' square
down too, just so LastBit() returns something well defined.
I'll admit, I don't understand all you've written because I don't know any
assembler, but I understand the potential problem, 0 would not be a good return
value because 0 does not mean 'no-square', it's the first square (in my
notation).

-S.

>The best part about having BSR/BSF
>is that the ONLY tables you need for doing sliding piece calculations are tables
>of the actual rays.  They take 8*64*8=4096 bytes, or 4K.  On a 32-bit machine
>like an Intel machine, you have to do BSR/BSF on the halves of the bitboard
>separately.  If you take advantage of this, and split your code based on which
>half of the board the source square is in (lo half or hi half), you'll notice
>that for 5 of the 8 rays, the result MUST be in the same half of the board and
>only 1 BSR/BSF is required.  The other 3 rays require 2.  This also directly
>implies that for each half of board, 5 of the 8 rays require only one half of
>the bitboard to be stored in the table.  I.e. at least 4*64*5 = 1280 bytes are
>guaranteed to be zero.  In my assembly move generator, I first decide which half
>the source square is in, then proceed to do exactly the BSR/BSFs that are
>necessary to identify the end of each ray.  Since each insn is using half of a
>bitboard from somewhere in that table, it is using one of 16 4-aligned offsets
>per square, which are constants.  I removed 4 of the 5 unused bitboard halves,
>shrunk the table to 3K and simply changed my constants.
>
>have fun,
>wylie



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.