Author: Gerd Isenberg
Date: 04:39:10 02/26/06
Hi all,
First greetings and good luck to all cct8 participants!
Second some thoughts and suggestions for improvements of Dann's switch-approach:
Dann's approach, to do a switch of a masked row per square and ray-kind with 128
bitboard cases does almost the same as rotated lookups with rotated occupied
state and square - it may supply appropriate attack bitboards for that ray-kind
and square - and even other precalculated information such as possible covered
xray information. Correct me if i'm wrong, Dann.
Despite it was interesing to see how the compiler translates a switch with 128
64-bit cases in a binary search manner, Dann's switch-approach covers a lot of
branch target buffer slots and branch prediction ressources. Also each maked
move inside the search changes almost 7 occupied rays so that some
miss-predictions are likely in the compare/conditional-jump chains of the
switches.
Therefor the idea to fold the up to seven "ray-masked" occupied bits down to a
7-bit range (0..127) - per square and one of four kind of rays - to supply e.g.
some De Bruijn hashing.
We can map all 128 masked ray bitboards (over all squares and ray kinds) very
easily (lot of fitting De Bruijns) to 8-bit range - but we need most often
different de Bruijns:
input:
sq ::= 0..63 square index of a sliding piece
dir ::= 0..3 kind of ray (two diagonals, horicontal, vertical)
+--------------------------------------------------------------------------+
| occIdx256 ::= (occupiedBB & rayMask[sq][dir]) * deBruijn[sq][dir]) >> 8; |
+--------------------------------------------------------------------------+
Suppose we have rotated like attack sets, we need
BitBoard preCalulatedAttacks[64][4][256];
which has a size of 512 KByte with 256 KByte usefull information because only
128 of 256 states are relevant.
Thus, getting sliding atttacks per square and ray kind works that way:
BitBoard attacks = preCalulatedAttacks[sq][dir][occIdx256];
I think because 64-bit multiplicatiion is only 4 amd64 cycles, this is a
competetive alternative to rotated lookups - we need only one (none-rotated)
occupied bitboard.
We can also reduce the attacked loopup tables to 256 KByte by performing a
further indirection by a 256 to 128 lookup per square and ray-kind, which needs
a 64 KByte table - but i think that don't pays off since we usually don't profit
from fetching several entries from one cacheline.
-------------------------------------------------------------------------------
Of course it would be nice to avoid memory waste and to map masked occupied
states directly to a 7-bit range. Indeed i found De Bruijns (rayMagic) that fit
those requirement for most squares and ray-kinds:
+--------------------------------------------------------------------------+
| occIdx128 ::= (occupiedBB & rayMask[sq][dir]) * rayMagic[sq][dir]) >> 7; |
+--------------------------------------------------------------------------+
Some further rayMagics were found with "modified" De Bruijns, such as
bool propertyCheck(BitBoard dB)
{
for (int x=0; x < N; x++) {
if ( rayCheck (db) )
return true;
dB -= someConstBB }
}
That starts to take some hours computing time, while iterating over all de
Bruijns. If you try to do you own "research" here is the already posted but
slightly optimized De Bruijn class for you:
-------------------------------------------------------------------------------
/* usage:
DeBruijnGenerator deBruijn;
genDeBruijn(6); // for 64-6 sequences
*/
class DeBruijnGenerator
{
protected:
BitBoard lck, seq;
unsigned int depth, mask;
int count;
protected:
void __fastcall searchDeBruijn(unsigned int i) { // i = (int)(seq >> depth) &
mask;
if ( depth > 2) {
unsigned int j = (i+i)&mask; // next even index
lck ^= pow2[i]; // lock current index
depth-= 2; // sequence index _East_One
if ( (lck & pow2[j]) == 0 ) { // next even index unlocked?
unsigned int k = (j+j)&mask; // next even index
lck ^= pow2[j]; // lock current index
if ( (lck & pow2[k]) == 0 ) { // next even index unlocked?
seq &= ~pow2[depth+1]; // "append" zero
seq &= ~pow2[depth]; // "append" zero
searchDeBruijn(k); // do recursive search with next even index
}
if ( (lck & pow2[k+1]) == 0 ) { // next odd index unlocked?
seq &= ~pow2[depth+1]; // "append" zero
seq |= pow2[depth]; // "append" one
searchDeBruijn(k+1); // do recursive search with next odd index
}
lck ^= pow2[j]; // unlock current index
}
if ( (lck & pow2[j+1]) == 0 ) { // next odd index unlocked?
unsigned int k = (j+j+2)&mask; // next even index
lck ^= pow2[j+1]; // lock current index
if ( (lck & pow2[k]) == 0 ) { // next even index unlocked?
seq |= pow2[depth+1]; // "append" one
seq &= ~pow2[depth]; // "append" zero
searchDeBruijn(k); // do recursive search with next even index
}
if ( (lck & pow2[k+1]) == 0 ) { // next odd index unlocked?
seq |= pow2[depth+1]; // "append" one
seq |= pow2p[depth]; // "append" one
searchDeBruijn(k+1); // do recursive search with next odd index
}
lck ^= pow2[j+1]; // unlock current index
}
depth+=2; // sequence index _West_One
lck ^= pow2[i]; // unlock current index
} else if ( depth > 1 ) {
unsigned int j = (i+i)&mask; // next even index
lck ^= pow2[i]; // lock current index
depth--; // sequence index _East_One
if ( (lck & pow2[j]) == 0 ) { // next even index unlocked?
seq &= ~pow2[depth]; // "append" zero
searchDeBruijn(j); // do recursive search with next even index
}
if ( (lck & pow2[j+1]) == 0 ) { // next odd index unlocked?
seq |= pow2[depth]; // "append" one
searchDeBruijn(j+1); // do recursive search with next odd index
}
depth++; // sequence index _West_One
lck ^= pow2[i]; // unlock current index
} else if ( (lck & pow2[(i+i+1)&mask]) == 0 ) {
deBruijnFound(seq); // bit zero was already set during initialization
count++;
}
}
// overload this function in a derived class to look for
// desired properties
//======================================================
virtual void deBruijnFound(BitBoard deBruijn) const {};
public:
// constructor
//============
DeBruijnGenerator() {count = 0;}
// and generator routine
//======================
void genDeBruijn(unsigned int expOfTwo) {
if ( expOfTwo > 6 ) {
printf ("%d exceeds max exponent 6 for 64/6 sequence\n", expOfTwo);
expOfTwo = 6;
}
unsigned int powOfTwo = 1 << expOfTwo;
seq = 1; // the initial sequence
lck = pow2[powOfTwo/2]; // lock last index for performance
depth = powOfTwo-expOfTwo; // initial depth for expOfTwo leading zeros
mask = powOfTwo-1; // mask
searchDeBruijn(0); // start with index 0 due to expOfTwo leading zeros
}
};
// some const arrays
BitBoard pow2[64] =
{
0x0000000000000001,0x0000000000000002,0x0000000000000004,0x0000000000000008,
0x0000000000000010,0x0000000000000020,0x0000000000000040,0x0000000000000080,
0x0000000000000100,0x0000000000000200,0x0000000000000400,0x0000000000000800,
0x0000000000001000,0x0000000000002000,0x0000000000004000,0x0000000000008000,
0x0000000000010000,0x0000000000020000,0x0000000000040000,0x0000000000080000,
0x0000000000100000,0x0000000000200000,0x0000000000400000,0x0000000000800000,
0x0000000001000000,0x0000000002000000,0x0000000004000000,0x0000000008000000,
0x0000000010000000,0x0000000020000000,0x0000000040000000,0x0000000080000000,
0x0000000100000000,0x0000000200000000,0x0000000400000000,0x0000000800000000,
0x0000001000000000,0x0000002000000000,0x0000004000000000,0x0000008000000000,
0x0000010000000000,0x0000020000000000,0x0000040000000000,0x0000080000000000,
0x0000100000000000,0x0000200000000000,0x0000400000000000,0x0000800000000000,
0x0001000000000000,0x0002000000000000,0x0004000000000000,0x0008000000000000,
0x0010000000000000,0x0020000000000000,0x0040000000000000,0x0080000000000000,
0x0100000000000000,0x0200000000000000,0x0400000000000000,0x0800000000000000,
0x1000000000000000,0x2000000000000000,0x4000000000000000,0x8000000000000000
};
BitBoard pow2p[64] =
{
0x0000000000000003,0x0000000000000006,0x000000000000000c,0x0000000000000038,
0x0000000000000030,0x0000000000000060,0x00000000000000c0,0x0000000000000180,
0x0000000000000300,0x0000000000000600,0x0000000000000c00,0x0000000000001800,
0x0000000000003000,0x0000000000006000,0x000000000000c000,0x0000000000018000,
0x0000000000030000,0x0000000000060000,0x00000000000c0000,0x0000000000180000,
0x0000000000300000,0x0000000000600000,0x0000000000c00000,0x0000000001800000,
0x0000000003000000,0x0000000006000000,0x000000000c000000,0x0000000018000000,
0x0000000030000000,0x0000000060000000,0x00000000c0000000,0x0000000180000000,
0x0000000300000000,0x0000000600000000,0x0000000c00000000,0x0000001800000000,
0x0000003000000000,0x0000006000000000,0x000000c000000000,0x0000018000000000,
0x0000030000000000,0x0000060000000000,0x00000c0000000000,0x0000180000000000,
0x0000300000000000,0x0000600000000000,0x0000c00000000000,0x0001800000000000,
0x0003000000000000,0x0006000000000000,0x000c000000000000,0x0018000000000000,
0x0030000000000000,0x0060000000000000,0x00c0000000000000,0x0180000000000000,
0x0300000000000000,0x0600000000000000,0x0c00000000000000,0x1800000000000000,
0x3000000000000000,0x6000000000000000,0xc000000000000000,0x8000000000000000
};
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.