# Computer Chess Club Archives

## Messages

### Subject: sliding move generation with check information

Author: Gerd Isenberg

Date: 13:45:51 02/22/06

```Hi,

some more elaboration and more detailed explanation on "branchless" sliding move
generation. Hope it will at least be entertaining and eventually will motivate
some to have a closer look to a Westcott-like fillbased bitboard framework.

The idea was motivated by the well known paper:
http://supertech.csail.mit.edu/papers/debruijn.pdf
specially Chapter 4 Indexing two 1's

prolog - move-indices
=====================

The first thing was to save bitscans in my fillbased bitboard framework.
Determing unique move-indices per direction by performing a multiplicative
DeBruijn Hashfunction with "ored" isolated from- and to-bitboards. 224 or 140
possible unique from-to move combinations per straight and diagonal ray:

8*7 + 8*6 + 8*5 + 8*4 + 8*3 +  8*2 +  8*1 = 8*28 = 224
1*7 + 3*6 + 5*5 + 7*4 + 9*3 + 11*2 + 13*1 = 5*28 = 140

In Chapter 4, "Indexing two 1's" the authors of the above paper introduce a
hashfunction to map all 64-bit numbers with at most two 1's
(64*63/2+64+1 == 2081) into a table with 32706 entries:

h(x) ::= (x * 0xE50FA91BE3A25401) >> (64-15);

What, if the both 1's were dependent, such as move from-to bits of a sliding
piece? Should be a much smaller range - but how much?

Some time ago i wrote a recursive deBruijn generator to find factors to map all
possible from-to-bitboards. Each time a deBruijn constant was found at the leaf
of the De Bruijn tree, one or more unambiguously test for all enumerated from-to
sets per direction were applied to look for collisions.

I tried modified (some add and xor) DeBruins, ored "fromto" with distinct
directions, as well as "from" minus "to" with combined opposite directions and a
lot more.

The most compact lookup-arrays was with distinct directions. With a lot of
fitting DeBruijn and modified DeBruijn constants. The 140 diagonals maps into a
1/2K range, the 240 straights into 1K:

strMoveIdx ::= ((strFrom|strTo)*DeBruijn) >> (64-10);
diaMoveIdx ::= ((diaFrom|diaTo)*DeBruijn) >> (64-9);

I intend to live with cute moveindices, without further lookup except for
straight sliding moves. DoMove is basically a 3*16 byte-xor and a 16-byte
and/add - but that is another story...

The movelist idea
=================

Recently i got the idea to map not only single moves but movelists of 0..7
moves, determined a sliding attack ray set. "Surprisingly" at the first glance,
the number of possible none empty ray attack or target sets is exactly the
number of moves per ray. But the number of bits vary from 1..7 (or even 0..7)
considering empty sets.

This 140(+1) or 224(+1) unique move target bitboards map to a 9 or 10 bit index
range as well - fitting DeBruijns are a bit more rarely as for the pure move
indices.

strMoveListIndex ::= (strAttacks*DeBruijn) >> (64-10);
diaMoveListIndex ::= (diaAttacks*DeBruijn) >> (64-9);

Example with north-east direction with "little endian" mapping - white bishop on
b2, black piece on f6 as "blocking" piece:

(
h8=63
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
a1=0           h1=7

* 0x0219c7bf2a4dae8b ) >> (64-9)

==> 229  (the 55. from 140 unique indices in a 2**9 == 512 range)

Now, array slot 229 refers the precalculated fixed sized movelist of eight
32-bit integers, up to seven moves and number of moves (always starts with
greatest distance as this is eventually a capture target):

ptr[229] => {b2f6, b2e5, b2d4, b2c3, null, null, null, 4}

Now sliding move generation becomes rather loop-, branch- and bitscan-less:

// e.g. kogge stone fill with single bishop
BB noeast = getNorthEastAttacks(bishop);

BB targets = noeast & ~ownPieces;

idx = (targets * 0x0219c7bf2a4dae8b) >> (64-9);
ptr = NorthEastBishopMoves + NorthEastIdx[idx];

// unconditional move of 7(8) ints
movelist[x+0] = ptr->i[0];
movelist[x+1] = ptr->i[1];
movelist[x+2] = ptr->i[2];
movelist[x+3] = ptr->i[3];
movelist[x+4] = ptr->i[4];
movelist[x+5] = ptr->i[5];
movelist[x+6] = ptr->i[6];

// first is possible capture target
uint piece2cap = board[movelist[x+0].to]; // captured piece or empty

// modify if capture move
movelist[x+0] |= captureFlags[piece2cap];

// increment index from 0..7
x += pt

Unconditional moving seven ints avoids branches and may be done by a special
memcpy with simd-wise with 128-bit SSE2-intrinsics with two aligned movdqa,
while writing unaligned with two movdqu to the movelist. The precalculated moves
inside a movelist may contain some static move score, based on pure geometrical
center distance relation as a rough estimate for some movesorting. For a pure
capture search, only one move and increment by max(1,ptr[idx]->i[7]) is
necessary.

So far so good - similar is possible with rotated. First one may triple the
rotated lookup-table sizes by introducing not only combined opposite direction
attacks but also the two disjoint. Second on may also index precalculated
movelists with occupied state and square index - even if handling captures or
whether attacked squares ar occupied by own pieces becomes a bit more
complicated. Keeping direction wise disjoint attacks has some merits - also for
determing pinned pieces, remove checker x-rays etc.

more sophisticated idea and further brainstorming
=================================================

I like to determine checking moves during generation in advance.
May be it is not so important for move ordering (trying checks earlier might
produce bigger trees), but to eventually avoid possible futility pruning near or
at the tips.

Imagine the above north-east set of a white b2 bishop and the black king on b8
so that the white bishop may check on e5 (x):

h8=63
0 k 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 x 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 B 0 0 0 0 0 0
0 0 0 0 0 0 0 0
a1=0           h1=7

I have all bishop-like (as well as rook-like) attacks from the opposite king
(all 4 diagonal directions combined) - a set of possible check targets for
bishops and queens . If the intersection of the disjoint direction attack set
with the appropriate checkset is not zero, there is a one square intersection -
otherwise we could capture the king.

I was thinking about how to squeeze this possible one-bit set, a check target
square, to the original attack set but still keeping unique sets with a perfect
hash-function.

idx = (modifiedTargets * deBruijn) >> (64-nbitsneeded);

with the check tag already set for B/Q b2e5:

ptr[idx] => {b2f6, b2e5+, b2d4, b2c3, null, null, null, 4}

or alternative, if check moves were already generated in an earlier pass:

ptr[idx] => {b2f6, b2d4,  b2c3, null, null, null, null, 3}

How to modify targets and how many bits are needed for a most compact value
range? The number of 224 distinct rook targets per direction becomes 4 times
bigger - 896 with up to one check-target per set. The number of 140 bishop
targets becomes only 3.4 times bigger - 476 with up to one check-target per set!

Of course one couldn't modify target by xoring checkset, because sets become
obviously ambiguous if outer bits are affected. I thought about oring with some
left and some right shift of the checkset and that seems to work - no matter how
small and big the check square index is (x<<left)|(x>>right) always becomes none
zero if x is power of two and left+right < 64:

modifiedTargets ::= targets | (x<<left) | (x>>right);

I changed the deBruijnFound function of my De Bruijn generator with two
left/right shift amount loops, to supply the unambiguously test function not
only with the De Bruijn constant and number of bits of the range, but also with
left/right shift amounts for the checksets.

The diagonal range grows from 9-bit to 12-bit - 8-times bigger, 4K instead of
1/2K to map 476 entries instead of 140 - some diagonals deBruijn:

dir     #bits min max  #   << >> deBruijn
NoEast  12   184  4049 476 33  2 0x021934563a97e7b7
SoEast  12    27  4060 476 34  3 0x0219b1f92ef456a7
SoWest  12   225  4081 476 25  7 0x021937151d2fcf6b
NoWest  12    32  3961 476  1 26 0x021977e71eb4549b

Unfortunately the straight range seem to grow from 10-bit to 14-bit - 16-times

So sliding move generation with check information may look that way:

// e.g. kogge stone fill with single bishop
BB noeast = getNorthEastAttacks(bishop);

BB targets = neast & ~ownPieces;

// if ( targets )

BB checkset = targets & diagonalCheckTargets;
targets |= (checkset<<33) | (checkset>>2);
uint idx = (targets * 0x021934563a97e7b7) >> (64-12);
ptr = NorthEastBishopMovesC + NorthEastIdxPlusC[idx];

// unconditional move of 7(8) ints
movelist[x+0] = ptr->i[0];
movelist[x+1] = ptr->i[1];
movelist[x+2] = ptr->i[2];
movelist[x+3] = ptr->i[3];
movelist[x+4] = ptr->i[4];
movelist[x+5] = ptr->i[5];
movelist[x+6] = ptr->i[6];

// first is possible capture target
uint piece2cap = board[movelist[x+0].to]; // captured piece or empty

// modify if capture move
movelist[x+0] |= captureFlags[piece2cap];

// increment index from 0..7
x += ptr[idx]->i[7]; // += n

Still brainstorming - and corious about my first perft counts.

Cheers,
Gerd

```