Author: Wylie Garvin
Date: 18:53:16 01/31/02
Go up one level in this thread
On January 28, 2002 at 08:36:15, Sune Fischer wrote: >Hi, > >I'm in the process of rewriting my entire movegen, now I'm going to need >bitboards I've decided, but it seems there are different approaches. > >I found this link (it was actually dead, but google had it cached so I put it up >for a short while) >http://www.fys.ku.dk/~fischer/Temp/New%20Technology.htm > >It is about reversed bitboards, the author claims it is faster than rotated: > >"When taking account of memory latencies, calculating the piece attacks using >the forward and reverse bitboards can be done significantly faster due to total >independence on lookup tables and complex calculations. There are a few slight >snags with the diagonal calculations, but they are minor." > > >His description ("significantly faster") entrigues me (hehe) but I can't find >anything else on reversed bitboards and I've never heard of them before. > >What's the catch, is there something he is not telling? > >-S. Hi Sune, Here's the thing--on machines which DON'T have bit-scan instructions, you need some technique which can tell you where the "ray" of reachable squares for a sliding piece ends. The "rotated bitboard" approach lets you do a little bit of shifting and masking to come up with an "occupancy number" for a rank/file/diagonal (i.e. of which there are 4 per square), which then indexes into a LARGE table (I think it's about 512KB) to return the desired bitboard with the reachable squares along that rank/file/diagonal. 512kb is punishing on your cache though, so you can merge together adjacent results and do an additional masking afterwards (i.e. if looking up along a file, have the files for all 8 ranks combined into one result board. if looking up along diagonals, combine by either rank or file). This reduces the occupancy tables to about 32kb. The "reversed bitboard" approach takes advantage of the way arithmetic carry works to locate the end of the ray for you. You take all pieces, mask with the appropriate *ray* (i.e. of which there are 8 per square) and do the b^(b-1) trick or some variant. The reason you need to maintain a reversed bitboard is simply that arithmetic carry only goes in one direction! The main problem with the reversed-bitboard approach seems to be that it *can* be costly, if you are not careful, to combine the forward and reversed bitboards. In particular, you have managed to avoid using a table by using this technique, so now you need some efficient way to reverse the generated mask. Note you can do this BEFORE anding the mask with the appropriate ray again...i.e. you can do it while it has the form 0000..11111..0000, i.e. A 0's, B 1's and C 0's. In this case the reversed mask has C 0's, B 1's and A 0's, and in theory you could do the reversal with a single left or right shift, if you knew how far to shift it. When I was thinking of coding this up in assembly for the Pentium, my strategy was: (1) Use BSR/BSF (the bit-scan instructions on Intel) to discover A and C. (2) shift the result left by C (so it now has the form 1111..0000000). (3) shift the result right by A. But doing this efficiently was still pretty complicated, and finally I clued in to the obvious better way of doing it (on Intel hardware anyway). So I offer here my preferred technique. It only works on machines with bit-scan instructions. If you don't care about other platforms besides the IA32 platform (or x86 platform as its called), then this is the one you should use: - Take all pieces and mask with appropriate ray. - Use the BSR or BSF instructions to find the first/last 1 in this bitboard (which one you use depends on the "direction" of the ray. I number my squares so that H1 is 0, G1 is 1, ..., A1 is 7, H2 is 8, ... H8 is 56, ... A8 is 63. In my case, the W, NW, N and NE rays are "forward", so I use BSRs to find the end, and the E, SE, S and SW rays are "backward" so I use BSFs to find the end). - if you are generating captures, set the bit with the number found in your result bitboard. If you are generating non-captures, you can either walk the squares from your source square to the number found, recording each move, or you can repeat the above steps for two opposing rays, and then for each one get the ray mask pointing in the *opposite* direction and originating at the number found, i.e. the first occupied square. Then *AND* the two such masks together and OR that into your result. In either case, remember to mask the result with empty squares (non-captures) or enemy pieces (captures) or you can get the wrong results. Also, note that the result of BSR/BSF is generally not predictable if you use it on an empty bitboard (the instruction sets the zero flag if this happens though, so you can detect it). For this reason, before starting, I take the mask of all pieces and OR in the borders of the board, and all my ray masks which would otherwise be empty contain a single bit for the source square instead. This way, such "empty rays" that stick off the board cause the source square to be added to the result mask (for captures). The final step of masking with enemy pieces removes that square. The final masking step also removes the source square from the reachable squares along the diagonal which are obtained by ANDing the two opposing rays as I tried to describe above. Anyway, I hope that wasn't too confused. The best part about having BSR/BSF is that the ONLY tables you need for doing sliding piece calculations are tables of the actual rays. They take 8*64*8=4096 bytes, or 4K. On a 32-bit machine like an Intel machine, you have to do BSR/BSF on the halves of the bitboard separately. If you take advantage of this, and split your code based on which half of the board the source square is in (lo half or hi half), you'll notice that for 5 of the 8 rays, the result MUST be in the same half of the board and only 1 BSR/BSF is required. The other 3 rays require 2. This also directly implies that for each half of board, 5 of the 8 rays require only one half of the bitboard to be stored in the table. I.e. at least 4*64*5 = 1280 bytes are guaranteed to be zero. In my assembly move generator, I first decide which half the source square is in, then proceed to do exactly the BSR/BSFs that are necessary to identify the end of each ray. Since each insn is using half of a bitboard from somewhere in that table, it is using one of 16 4-aligned offsets per square, which are constants. I removed 4 of the 5 unused bitboard halves, shrunk the table to 3K and simply changed my constants. have fun, wylie
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.