Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Reversed vs. Rotated Bitboards

Author: Wylie Garvin

Date: 18:53:16 01/31/02

Go up one level in this thread


On January 28, 2002 at 08:36:15, Sune Fischer wrote:

>Hi,
>
>I'm in the process of rewriting my entire movegen, now I'm going to need
>bitboards I've decided, but it seems there are different approaches.
>
>I found this link (it was actually dead, but google had it cached so I put it up
>for a short while)
>http://www.fys.ku.dk/~fischer/Temp/New%20Technology.htm
>
>It is about reversed bitboards, the author claims it is faster than rotated:
>
>"When taking account of memory latencies, calculating the piece attacks using
>the forward and reverse bitboards can be done significantly faster due to total
>independence on lookup tables and complex calculations. There are a few slight
>snags with the diagonal calculations, but they are minor."
>
>
>His description ("significantly faster") entrigues me (hehe) but I can't find
>anything else on reversed bitboards and I've never heard of them before.
>
>What's the catch, is there something he is not telling?
>
>-S.

Hi Sune,

   Here's the thing--on machines which DON'T have bit-scan instructions, you
need some technique which can tell you where the "ray" of reachable squares for
a sliding piece ends.

   The "rotated bitboard" approach lets you do a little bit of shifting and
masking to come up with an "occupancy number" for a rank/file/diagonal (i.e. of
which there are 4 per square), which then indexes into a LARGE table (I think
it's about 512KB) to return the desired bitboard with the reachable squares
along that rank/file/diagonal.  512kb is punishing on your cache though, so you
can merge together adjacent results and do an additional masking afterwards
(i.e. if looking up along a file, have the files for all 8 ranks combined into
one result board.  if looking up along diagonals, combine by either rank or
file).  This reduces the occupancy tables to about 32kb.

   The "reversed bitboard" approach takes advantage of the way arithmetic carry
works to locate the end of the ray for you.  You take all pieces, mask with the
appropriate *ray* (i.e. of which there are 8 per square) and do the b^(b-1)
trick or some variant.  The reason you need to maintain a reversed bitboard is
simply that arithmetic carry only goes in one direction!  The main problem with
the reversed-bitboard approach seems to be that it *can* be costly, if you are
not careful, to combine the forward and reversed bitboards.  In particular, you
have managed to avoid using a table by using this technique, so now you need
some efficient way to reverse the generated mask.  Note you can do this BEFORE
anding the mask with the appropriate ray again...i.e. you can do it while it has
the form 0000..11111..0000, i.e. A 0's, B 1's and C 0's.  In this case the
reversed mask has C 0's, B 1's and A 0's, and in theory you could do the
reversal with a single left or right shift, if you knew how far to shift it.

   When I was thinking of coding this up in assembly for the Pentium, my
strategy was:
    (1) Use BSR/BSF (the bit-scan instructions on Intel) to discover A and C.
    (2) shift the result left by C (so it now has the form 1111..0000000).
    (3) shift the result right by A.

   But doing this efficiently was still pretty complicated, and finally I clued
in to the obvious better way of doing it (on Intel hardware anyway).

   So I offer here my preferred technique.  It only works on machines with
bit-scan instructions.  If you don't care about other platforms besides the IA32
platform (or x86 platform as its called), then this is the one you should use:
    - Take all pieces and mask with appropriate ray.
    - Use the BSR or BSF instructions to find the first/last 1 in this bitboard
(which one you use depends on the "direction" of the ray.  I number my squares
so that H1 is 0, G1 is 1, ..., A1 is 7, H2 is 8, ... H8 is 56, ... A8 is 63.  In
my case, the W, NW, N and NE rays are "forward", so I use BSRs to find the end,
and the E, SE, S and SW rays are "backward" so I use BSFs to find the end).
    - if you are generating captures, set the bit with the number found in your
result bitboard.  If you are generating non-captures, you can either walk the
squares from your source square to the number found, recording each move, or you
can repeat the above steps for two opposing rays, and then for each one get the
ray mask pointing in the *opposite* direction and originating at the number
found, i.e. the first occupied square.  Then *AND* the two such masks together
and OR that into your result.

  In either case, remember to mask the result with empty squares (non-captures)
or enemy pieces (captures) or you can get the wrong results.  Also, note that
the result of BSR/BSF is generally not predictable if you use it on an empty
bitboard (the instruction sets the zero flag if this happens though, so you can
detect it).  For this reason, before starting, I take the mask of all pieces and
OR in the borders of the board, and all my ray masks which would otherwise be
empty contain a single bit for the source square instead.  This way, such "empty
rays" that stick off the board cause the source square to be added to the result
mask (for captures).  The final step of masking with enemy pieces removes that
square.  The final masking step also removes the source square from the
reachable squares along the diagonal which are obtained by ANDing the two
opposing rays as I tried to describe above.

  Anyway, I hope that wasn't too confused.  The best part about having BSR/BSF
is that the ONLY tables you need for doing sliding piece calculations are tables
of the actual rays.  They take 8*64*8=4096 bytes, or 4K.  On a 32-bit machine
like an Intel machine, you have to do BSR/BSF on the halves of the bitboard
separately.  If you take advantage of this, and split your code based on which
half of the board the source square is in (lo half or hi half), you'll notice
that for 5 of the 8 rays, the result MUST be in the same half of the board and
only 1 BSR/BSF is required.  The other 3 rays require 2.  This also directly
implies that for each half of board, 5 of the 8 rays require only one half of
the bitboard to be stored in the table.  I.e. at least 4*64*5 = 1280 bytes are
guaranteed to be zero.  In my assembly move generator, I first decide which half
the source square is in, then proceed to do exactly the BSR/BSFs that are
necessary to identify the end of each ray.  Since each insn is using half of a
bitboard from somewhere in that table, it is using one of 16 4-aligned offsets
per square, which are constants.  I removed 4 of the 5 unused bitboard halves,
shrunk the table to 3K and simply changed my constants.

have fun,
wylie



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.