Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Faster Board Representations/Move Generators

Author: Gerd Isenberg

Date: 13:30:12 02/24/06

Go up one level in this thread


A quadbitboard is simply an array of four bitboards.
Empty square and piececodes resist in vertical nibbles (4 bits).
For instance the initial position with arbitrary piece codes, but zero as empty
square:

white  black
0010 - 0011 - king
0100 - 0101 - pawn
0110 - 0111 - knight
1000 - 1001 - bishop
1000 - 1001 - bishop
1010 - 1011 - rook
1100 - 1101 - queen

   h8                                        a1=0
q0  1111111111111111.......0000000000000000
q1  1101001100000000.......0000000011010011
q2  0100101011111111.......1111111101001010
q3  1010110100000000.......0000000010101101

Takes 256 bits.

Fill-routines like Steffan Westcott's genious Kogge-Stone-Fill have the
advantage to work setwise. So there is no need to isolate or bitscan (log2)
several man of one piece-kind. One may even pass combined queen/bishop- or
queen/rook-sets to appropriate fill-routines for one directed ray.
For instance a north-east fill with bishops and queen:

__forceinline BitBoard noEastOne (BitBoard b) {return (b<<9) & notA;}

__forceinline
BitBoard noEastOccl(BitBoard gen, BitBoard pro) {
	pro  = pro &  notA;
	gen |= pro & (gen <<  9);
	pro  = pro & (pro <<  9);
	gen |= pro & (gen << 18);
	pro  = pro & (pro << 18);
	gen |= pro & (gen << 36);
	return gen;
}

__forceinline BitBoard noEastSld (BitBoard set, BitBoard empty) {
	return noEastOne(noEastOccl(set, empty));
}

[d] rn3rk1/ppb2ppp/2p5/8/8/1P4P1/PBQ2PBP/R5K1 w - - 0

0 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0    0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0    0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0    0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0    0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0    0 0 1 1 0 0 0 1
0 1 1 0 0 0 1 0    0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0    0 0 0 0 0 0 0 0

The small problem with move-generation is to map the source square of a piece
while traversing attacked target squares. Because there is no one to one
relationship. The general solution is a "fill back" of the target square with an
intersection (and) of own pieces.

If you do kogge-stone with quadbitboards, you are able to mutex fill up to 15
bitboards with disjoint sets at once. Of course like fill-stuff in general
64-bit architectures with some registers are required - or architectures with
vector registers such as intel and amd's SSE2 or Altivec for G5.

Here a sample implmementation with a template class T either XMM or GPR128, both
derived from a class/uinion with 128-bit size doing bytewise vector arithmetik:

template <class T> __forceinline
void noEastAttacks(QBB& t, const QBB& s, const BB &empty) {
	T* pt = (T*)&t;
	T g0((T&)s.bb[0]);
	T g1((T&)s.bb[2]);
	T p(empty);
	p  = butNotA(p);
	g0 |= p & (g0 <<  9);
	g1 |= p & (g1 <<  9);
	p   = p & ( p <<  9);
	g0 |= p & (g0 << 18);
	g1 |= p & (g1 << 18);
	p   = p & ( p << 18);
	g0 |= p & (g0 << 36);
	g1 |= p & (g1 << 36);
	// notA by bytewise add
	pt[0] = (g0+g0) << 8;
	pt[1] = (g1+g1) << 8;
}

The premise is to boost ipc to the max. No branches, fast aligned local L1
accesses if possible and trying to hide eventually expensive memory accesses
with pure register calculation. So mixing and scheduling fillstuff for several
directions on different register-sets like xmm, mmx and/or 64-bit general
purpose makes sense.

Quadbitboard filler combine the setwise advantage of fill-routines with getting
disjoint sets and one-to-one relationship of source- and target-squares. Of
course some muxing and demuxing is necessary here and there.

I use a less dense sliding piece code to fill sliders and king as meta-slider:
diagonals:
 0 : black
 1 : king
 2 : queen
 3 : slider (queen or bishop)
straight:
 0 : black
 1 : king or rook 1
 2 : queen
 3 : slider (queen or rook)

This allows efficient access of combined all sliding attacks (bitboard 3) as
well as disjoint attacks. rook1 are the single isalated "least significant"
rooks of both sides (r & -r) to get usually disjoint rook attacks (if <= 2 rooks
per side). Bishops are separated by square colors.

Pins and discovered checker are peanuts now. X-rays as well.
If a slider A on a ray is defended by an friendly slider B, all attacks of A are
shadow attacks of B, and so on...

====

The idea with the De Bruijn multipication for movegen contradicts the general
Westcott approach. He keeps directionwise target sets as a kind unsorted
movelist - also for bookholding reasons for an incremental movegen. De Bruijn
movegen can generate disjoint hashmove, captures with several gains, checks and
quite moves in phases. I think that is quite sufficent - in all-node it makes
sense to generate all at once. So i have SEE for captures but with De Bruijn no
distinct move generation for instance to safe and unsafe target sets. If i like
to distinguish those moves i have to traverse the generated movelist, same to
assign killer- and history scores - piece square scores are integral part of
precalculated movelist items. Of course score assignment may be skipped at all
nodes.

Even with score assignment i think de Bruijn movegen is fine.
Score assignment should be implemented "branchless", there is only one final
loop-break miss-prediction (SSE2-domain as well, same is true for picking
best move out of the sorted movelist).



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.