Author: Sune Fischer
Date: 17:00:50 02/01/02
Go up one level in this thread
On January 31, 2002 at 21:53:16, Wylie Garvin wrote: >On January 28, 2002 at 08:36:15, Sune Fischer wrote: > >>Hi, >> >>I'm in the process of rewriting my entire movegen, now I'm going to need >>bitboards I've decided, but it seems there are different approaches. >> >>I found this link (it was actually dead, but google had it cached so I put it up >>for a short while) >>http://www.fys.ku.dk/~fischer/Temp/New%20Technology.htm >> >>It is about reversed bitboards, the author claims it is faster than rotated: >> >>"When taking account of memory latencies, calculating the piece attacks using >>the forward and reverse bitboards can be done significantly faster due to total >>independence on lookup tables and complex calculations. There are a few slight >>snags with the diagonal calculations, but they are minor." >> >> >>His description ("significantly faster") entrigues me (hehe) but I can't find >>anything else on reversed bitboards and I've never heard of them before. >> >>What's the catch, is there something he is not telling? >> >>-S. > >Hi Sune, > > Here's the thing--on machines which DON'T have bit-scan instructions, you >need some technique which can tell you where the "ray" of reachable squares for >a sliding piece ends. > > The "rotated bitboard" approach lets you do a little bit of shifting and >masking to come up with an "occupancy number" for a rank/file/diagonal (i.e. of >which there are 4 per square), which then indexes into a LARGE table (I think >it's about 512KB) to return the desired bitboard with the reachable squares >along that rank/file/diagonal. 512kb is punishing on your cache though, so you >can merge together adjacent results and do an additional masking afterwards >(i.e. if looking up along a file, have the files for all 8 ranks combined into >one result board. if looking up along diagonals, combine by either rank or >file). This reduces the occupancy tables to about 32kb. > > The "reversed bitboard" approach takes advantage of the way arithmetic carry >works to locate the end of the ray for you. You take all pieces, mask with the >appropriate *ray* (i.e. of which there are 8 per square) and do the b^(b-1) >trick or some variant. The reason you need to maintain a reversed bitboard is >simply that arithmetic carry only goes in one direction! The main problem with >the reversed-bitboard approach seems to be that it *can* be costly, if you are >not careful, to combine the forward and reversed bitboards. In particular, you >have managed to avoid using a table by using this technique, so now you need >some efficient way to reverse the generated mask. Note you can do this BEFORE >anding the mask with the appropriate ray again...i.e. you can do it while it has >the form 0000..11111..0000, i.e. A 0's, B 1's and C 0's. In this case the >reversed mask has C 0's, B 1's and A 0's, and in theory you could do the >reversal with a single left or right shift, if you knew how far to shift it. > > When I was thinking of coding this up in assembly for the Pentium, my >strategy was: > (1) Use BSR/BSF (the bit-scan instructions on Intel) to discover A and C. > (2) shift the result left by C (so it now has the form 1111..0000000). > (3) shift the result right by A. > > But doing this efficiently was still pretty complicated, and finally I clued >in to the obvious better way of doing it (on Intel hardware anyway). > > So I offer here my preferred technique. It only works on machines with >bit-scan instructions. If you don't care about other platforms besides the IA32 >platform (or x86 platform as its called), then this is the one you should use: > - Take all pieces and mask with appropriate ray. > - Use the BSR or BSF instructions to find the first/last 1 in this bitboard >(which one you use depends on the "direction" of the ray. I number my squares >so that H1 is 0, G1 is 1, ..., A1 is 7, H2 is 8, ... H8 is 56, ... A8 is 63. In >my case, the W, NW, N and NE rays are "forward", so I use BSRs to find the end, >and the E, SE, S and SW rays are "backward" so I use BSFs to find the end). > - if you are generating captures, set the bit with the number found in your >result bitboard. If you are generating non-captures, you can either walk the >squares from your source square to the number found, recording each move, or you >can repeat the above steps for two opposing rays, and then for each one get the >ray mask pointing in the *opposite* direction and originating at the number >found, i.e. the first occupied square. Then *AND* the two such masks together >and OR that into your result. > > In either case, remember to mask the result with empty squares (non-captures) >or enemy pieces (captures) or you can get the wrong results. Also, note that >the result of BSR/BSF is generally not predictable if you use it on an empty >bitboard (the instruction sets the zero flag if this happens though, so you can >detect it). For this reason, before starting, I take the mask of all pieces and >OR in the borders of the board, and all my ray masks which would otherwise be >empty contain a single bit for the source square instead. This way, such "empty >rays" that stick off the board cause the source square to be added to the result >mask (for captures). The final step of masking with enemy pieces removes that >square. The final masking step also removes the source square from the >reachable squares along the diagonal which are obtained by ANDing the two >opposing rays as I tried to describe above. > > Anyway, I hope that wasn't too confused. Heh, I'm very confused. But I did figure out a method very similar to this. To reverse the reversed bitboard, one can do a LastBit() call and get the last attacked square. Now this ray can be looked up in a permanent table with index: [sq][63-LastBit()] (because we already know the other end of the ray). This can now be OR'ed to the previous forward attack board. I agree the table wouldn't need to be larger than 4096 Bytes, but finding the right index will be a bit more tricky. As you say it is probably a good idea to mask the 'sq' square down too, just so LastBit() returns something well defined. I'll admit, I don't understand all you've written because I don't know any assembler, but I understand the potential problem, 0 would not be a good return value because 0 does not mean 'no-square', it's the first square (in my notation). -S. >The best part about having BSR/BSF >is that the ONLY tables you need for doing sliding piece calculations are tables >of the actual rays. They take 8*64*8=4096 bytes, or 4K. On a 32-bit machine >like an Intel machine, you have to do BSR/BSF on the halves of the bitboard >separately. If you take advantage of this, and split your code based on which >half of the board the source square is in (lo half or hi half), you'll notice >that for 5 of the 8 rays, the result MUST be in the same half of the board and >only 1 BSR/BSF is required. The other 3 rays require 2. This also directly >implies that for each half of board, 5 of the 8 rays require only one half of >the bitboard to be stored in the table. I.e. at least 4*64*5 = 1280 bytes are >guaranteed to be zero. In my assembly move generator, I first decide which half >the source square is in, then proceed to do exactly the BSR/BSFs that are >necessary to identify the end of each ray. Since each insn is using half of a >bitboard from somewhere in that table, it is using one of 16 4-aligned offsets >per square, which are constants. I removed 4 of the 5 unused bitboard halves, >shrunk the table to 3K and simply changed my constants. > >have fun, >wylie
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.