Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitboards, pre-computing moves, and blocked squares?

Author: Vasik Rajlich

Date: 04:22:10 12/03/04

Go up one level in this thread


On December 02, 2004 at 17:39:01, Gerd Isenberg wrote:

>On December 02, 2004 at 14:20:40, Vasik Rajlich wrote:
>
>>On December 02, 2004 at 12:51:28, Gerd Isenberg wrote:
>>
>>><snip>
>>>>>Hi Vas,
>>>>>
>>>>>yes, you are right. In a rotated bitboard environment the x^(x-2) trick doesn't
>>>>>makes sense, because a lookup already provides attacks in both opposite
>>>>>directions on one ray for one particular sliding piece.
>>>>>
>>>>>In my current mmx-implementation i don't use rotated attack lookups anymore but
>>>>>fillbased direction wise attack getters (Kogge-Stone and dumb7fill).
>>>>>As you can see, the fillbased attack getters work with bitboard parameters
>>>>>instead of a singular square index, and are able to generate attacks of sets,
>>>>>multiple pieces (rooks/queen for straight directions, bishops/queen for diagonal
>>>>>directions).
>>>>>
>>>>>Those fill based routines are great to get sets of taboo squares for the
>>>>>opposite king and to get pinned pieces - and to get something like a progessive
>>>>>mobility, e.g. a set of squares rooks may reach in two or even more moves etc..
>>>>>
>>>>>I use the x^(x-2) trick to substitute following Kogge-Stone fill-routine, which
>>>>>saves a few cycles for this particular direction:
>>>>>
>>>>>// Kogge-Stone implementation by Steffan Westcott
>>>>>
>>>>>BitBoard getRightRookAttacks (BitBoard rooks, BitBoard empty)
>>>>>{
>>>>>  return shiftRight(fillRightOccluded(rooks, empty));
>>>>>}
>>>>>
>>>>>BitBoard shiftRight (BitBoard b)
>>>>>{
>>>>>  return (b<<1) & 0xfefefefefefefefe;
>>>>>}
>>>>>
>>>>>BitBoard fillRightOccluded(BitBoard g, BitBoard p)
>>>>>{
>>>>>  p &= 0xfefefefefefefefe;
>>>>>  g |= p & (g << 1);
>>>>>  p &= (p << 1);
>>>>>  g |= p & (g << 2);
>>>>>  p &= (p << 2);
>>>>>  return g |= p & (g << 4);
>>>>>}
>>>>
>>>>Aha. So, you have also something like:
>>>>
>>>>BitBoard getALLRookAttacks (BitBoard rooks, BitBoard empty)
>>>>{
>>>>return
>>>>      getRightRookAttacks (rooks, empty) |
>>>>      getLeftRookAttacks (rooks, empty) |
>>>>      getUpRookAttacks (rooks, empty) |
>>>>      getDownRookAttacks (rooks, empty) ;
>>>>}
>>>>
>>>>And then you can do things like:
>>>>
>>>>Bitboard two_moves_away = getALLRookAttacks (getALLRookAttacks (rooks, empty),
>>>>empty);
>>>
>>>
>>>That's the idea.
>>>
>>>>
>>>>etc ...
>>>>
>>>>I'd guess this is slower than a lookup for a one-bit bitboard, but more
>>>>efficient once the mask starts to become more populated.
>>>
>>>Yes. I even tried it with single pupulated pieces (1ULL<<sq) in my former
>>>rotated lookup approach. Queen attacks (eight directions) is IIRC something like
>>>100 cycles with mmx-assembly (branchless and up to four mmx-instructions
>>>simultaniously).
>>>
>>>Huge 64-bit register files are necessary (AMD64, MMX, SSE2, Altivec, Itanium).
>>>Of couse it sucks on x86-32 with the few 32-bit gp-registers.
>>>But of course, 1/1 replacement of rotated attack getters with fill-routines is
>>>not the smartest way...
>>>
>>
>>As far as I can see, in order to generate moves, you'll need to use the (1<<sq)
>>bitboards. Maybe it makes sense to keep the lookup tables for move gen, and the
>>fill routines when you need to "fill" starting with >1 bit.
>
>
>That is a very reasonable approach - a pure fill approach requires some redesign
>to become efficient.
>
>With fill routines you are more in the pure bitboard centric world of sets
>without the need of converting single bits into square metric by "dead slow"
>bitscan ;-)
>

Hmmm. I follow the reference :) - and there is some truth to it. The approach of
bitscanning, then looking up each square, "feels" kind of slow. I am not sure
how you can avoid it for generating moves, though.

>>
>>>
>>>>
>>>>This might be an interesting tool to do some sort of "strategic reasoning". For
>>>>example, pass (!pawns) into the second parameter, or even (! (very fixed
>>>>pawns)), etc.
>>>
>>>Exactly - one may even "and" the "atacks in one set" with other taboo and
>>>occupied by own sets...
>>>
>>
>>Actually yes I can see a ton of ways to use this. The question is which ones are
>>really helpful eval terms.
>
>
>I guess there are a few conditional ones looking for trapped sliders and looking
>for sliding trajectories against the opposite king or the ability to control far
>stopp squares of passers in at least two moves etc.
>
>But yes, difficult stuff.

I became convinced quite a long time ago that bitboards would shine once you
start to make a really good evaluation - in this I agree with Bob Hyatt.

Unfortunately at the moment Rybka follows an even more basic law: if your
evaluation is bad, at least keep it cheap ;) There's nothing worse than a big
mess of untested eval terms ...

>
>
>>
>>>>
>>>>BTW I don't see how to get pins from this - unless you will manually unset bits
>>>>in the "empty" bitboard ...
>>>
>>>No - you have only to consider the king as meta-slider as well and then "and"
>>>opposite directions, e.g. for one of eight possible pin-directions:
>>>
>>>pinnedRem[RIGHT] = getRightRookAttacks (opprooks|oppqueens, empty)
>>>                 & getLeftRookAttacks  (ownKing, empty);
>>>
>>>pinned[ownside][RIGHT] = pinnedRem[RIGHT] & ownPieces;
>>>remChe[oppside][RIGHT] = pinnedRem[RIGHT] & oppPieces;
>>>... all branchless stuff
>>>
>>
>>Aha, that's nice. You even have the pinning direction.
>>
>>In fact you'd probably do:
>>pinnedRem[RIGHT_LEFT] =
>>   (getRightRookAttacks (opprooks|oppqueens, empty)
>>  & getLeftRookAttacks  (ownKing, empty))
>>|  (getLeftRookAttacks (opprooks|oppqueens, empty)
>>  & getRightRookAttacks  (ownKing, empty))
>>
>>and ditto for UP_LEFT. I can't see a reason to keep RIGHT and LEFT separate.
>>
>
>hmm... not sure, the disjoint directions are easier to handle to find the
>sliding pinner later. The popularity is never greater one - simpler, branchless
>coding.
>
>Of course in this LEFT_RIGHT case you can easily separate both direction wise
>sets by masking with ownKBB-1 or ~(ownKbb-1). But extracting unified sets is not
>often so simple, at least not so simple as to waste some memory keeping disjoint
>sets and to unify by "or" on the fly if required.
>

Yes, that's true. I thought that the sliding pinner can be found just as easily
for LEFT_RIGHT as for just LEFT, ie.

Bitboard pinner = opp_rooks | getLeftRightRookAttacks (l_r_pinned_piece, empty)

vs

Bitboard pinner = opp_rooks | getRightRookAttacks (l_pinned_piece, empty)

but actually getRightRookAttacks is cheaper than getLeftRightRookAttacks.

It's also true that it's easy to | the masks - so maybe it's better to keep them
as granular as possible.

>
>>>In case of a sliding check on this direction, pinnedRem[RIGHT] contains a set of
>>>empty squares between checking piece and checked king, which also is usefull.
>>>With fillalgos it is important to keep disjoint directions.
>>>
>>
>>True.
>>
>>>In opposite to Steffan i like to keep separated attacks for each sliding piece
>>>as well - therefore i'm playing with a quad-bitboard approach.
>>>The idea is to pass a quad-bitboard to fill-routines and to distinguish between
>>>attacks of 15 different pieces.
>>
>>Ok here you lost me :)
>
>hehe - i'm in the mood to explain it a bit,
>also to reflect it by myself ;-)
>
>Imagine this stange position occurs during search:
>
>[D] 8/1R2rk2/8/3q4/8/8/r1R4K/6Q1 w - -
>
>After there is no early return in search or qsearch, due to repetition, hash-hit
>or lazy leaf cut (depth <= 0 && lazyOrHashedEva >= beta ),
>i need attack maps for pinned piece detection, eval and (legal) movegen
>purposes.
>
>Two fills per direction ([RB]Q,K) for pinned piece detection.
>Two additional fills (later for movegen) if we like to distinguish remove
>checking moves.
>
>In eval it is nice to have all eight attack sets piece wise, to get trapped
>pieces. Single fills per piece and direction. Like rotated, one has to traverse
>all eight man, including queen-like meta-king attacks as part of a kingsafety
>term and determing sliding check targets), but one may safe the bitscan and only
>isolate least significant one-bit by (x & -x).
>
>The eval part may even be skipped as well, due to laziness or eval hash hit. The
>relation of total nodes searched, lazy cut-nodes and full eval nodes is
>interesting here.
>
>So conditionally eight fills per direction for eval.
>
>If no regulary eval leaf cut occurs, one has to traverse three white pieces for
>sliding move generation. This might be implemented at once, or incrementally to
>safe work in cut nodes.
>
>Up to more three fills per direction here.
>
>In best case there are two fills per dir with approach in mind, in worst case 8
>or up to 2+8+2+3 = 15.
>
>Instead of passing 8 or even 15 bitboards to one Kogge-Stone-routine, processing
>n-times the empty propagator, i consider one quadword as kogge-stone generator.
>The eight pieces above and empty square are coded in four bits eg. in following
>manner:
>
>0000 empty
>0001 wR1
>0010 wR2
>0100 wQ
>0111 wK
>1001 bR1
>1010 bR2
>1100 bQ
>1111 bK
>
>Those bits are arranged vertically in four bitboards,
>the quad bitboard for straight sliders:
>
>     0123456789...........63
>g0   00....0.1............0 // R1 or K
>g1   00....0.0............0 // R2 or K
>g2   00....1.0............0 // Q  or K
>gb   00....0.1............0 // black
>           | |
>           | bK on a2
>           wQ on g1
>
>Now one quad-bitboard kogge-stone fill routine looks like
>
>QuadBitBoard fillRightOccluded(QuadBitBoard g, BitBoard p)
>{
>  p    &= 0xfefefefefefefefe;
>  g.g0 |= p & (g.g0 << 1);
>  g.g1 |= p & (g.g1 << 1);
>  g.g2 |= p & (g.g2 << 1);
>  g.gb |= p & (g.gb << 1);
>  p    &= (p << 1);
>  g.g0 |= p & (g.g0 << 2);
>  g.g1 |= p & (g.g1 << 2);
>  g.g2 |= p & (g.g2 << 2);
>  g.gb |= p & (g.gb << 2);
>  p    &= (p << 2);
>  g.g0 |= p & (g.g0 << 4);
>  g.g1 |= p & (g.g1 << 4);
>  g.g2 |= p & (g.g2 << 4);
>  g.gb |= p & (g.gb << 4);
>  return g;
>}
>
>with some nice prospects of parallel execution, specially considering parallel
>work of different directions in disjoint execution units like 64-bit gp and/or
>mmx as well as SSE2.
>
>Disjoint piece attacks may easily be reconstrcuted on the fly with a few bitwise
>operations.
>
>The result of those routines is probably stored in some kind of thread-global
>memory and therefore with limited validity during a search routine.
>
>I even consider to do some parts of this stuff before probing the prefetched
>main-tt to hide "huge" latencies and to have a better base to decide about lazy
>eval and possible early mate or stalemate detection.
>
>I hope you got the idea - problem with more than two rooks or more than one
>queen per side. Special handling required for those rare case, since there is no
>unique target, source relationship anymore of the direction wise attacks.
>
>Diagonal "coding" is a bit easier since two bishops per side usually have
>disjoint sets of squares.

Ok, after quite some time :) I think I (somewhat) understand.

IIUC - the "p" parameter to your quad bitboard routines truly must be the set of
"empty" squares - no special eval tricks passing it (~pawns) are possible, since
then the bit codes would start overwriting each other. Also, the g parameter
truly must be the set of squares on which those pieces sit - you will no longer
be able to nest calls to this routine, because again the bits will start
overwriting each other.

For instance, no more of things like:

quad-Bitboard two_move_attacks =
fillRightOccluded(fillRightOccluded(majors_quad_bitboard, empty), empty)

A few things are still a bit unclear:

1) Why do you use 4 bits rather than 3 bits to encode the 8 piece combinations?
2) Do you really need a separate entry for white rook 1 and white rook 2? You
could give both of them the same bit sequence - this would also get rid of the
problems with >2 rooks or >1 queen.

Of course, then you won't know which rook attacks the given square. This is the
part I don't really understand w.r.t. move generation - if you do
FillRightOccluded (rooks, empty), you have target squares for rooks, but not
which square is for which rook.

Actually - IIUC - instead of a quad-bitboard you could use a six-bitboard
combination, one bit for each of the six major pieces. Since there would be no
overlap between the bits, you could nest the calls, etc, and maybe still get
some parallel performance benefit.

Vas

>
>
>>
>>But thanks for reminding me why I put bitboards into my program.
>
>No problem ;-)
>
>Cheers,
>Gerd
>
>>
>>Vas
>>><snip>



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.