Author: Andreas Guettinger
Date: 15:09:14 04/16/04
Go up one level in this thread
On April 16, 2004 at 17:47:09, Gerd Isenberg wrote:
>On April 16, 2004 at 17:14:14, Andreas Guettinger wrote:
>
>>On April 16, 2004 at 15:51:25, Gerd Isenberg wrote:
>>
>>>On April 16, 2004 at 06:14:25, Andreas Guettinger wrote:
>>>
>>>>Could somebody recap how the final macros look like, all inlined?
>>>>What bitboards are needed to use the macros?
>>>>Is the overhead of maintaining the bitmaps worth it?
>>>>Is it a speedup to 0x88? Do you think its faster than a for loop checking for
>>>>blocked squares between x and y?
>>>>
>>>>regards
>>>>Andy
>>>
>>>Hi Andy,
>>>
>>>actually i would suggest something like this, bitboard return, to additionally
>>>check whether the popcount is zero or one, probably looking for xrays, pins and
>>>remove checkers.
>>>
>>>inline
>>>BitBoard brq_path_clear(x88sqr gt, x88sqr ls, U64 BitBoard) {
>>> return (isqBit(gt) - 2*isqBit(ls)) & path & all;
>>>}
>>>
>>>BitBoard bstyle_attack(x88sqr x, x88sqr y) {
>>> int dir = direction0x88Diff[0x88+x-y]); // 256 sized lookup
>>> if ( dir & isDiagonalDirection )
>>> {
>>> BitBoard path = pathOfDirection[dir];
>>> if ( x > y ) return brq_path_clear(x, y, path);
>>> return brq_path_clear(y, x, path);
>>> }
>>> return (BitBoard) -1;
>>>}
>>>
>>>Same for rooks, excecpt looking for straight rays.
>>>
>>>This is cache friendly, has two conditional jumps and some computations.
>>>
>>>Probably a 64*64 inbetween lookup with x and y is faster, but pollutes the cache
>>>32KByte (+4/8/16Kbyte) a bit more.
>>>One should try, saves some instructions and missed branches.
>>>
>>>BitBoard bstyle_attack(square64 x, square64 y) {
>>> int dir = direction[x][y]; // 4K char(short/int lookup
>>> if ( dir & isDiagonalDirection )
>>> return inBetween[x][y] & all; // 32K lookup
>>> return (BitBoard) -1;
>>>}
>>>
>>>Or even faster and complete branchless with two proper initialized 32KByte
>>>arrays (contains -1 for non common squares for diagonal and straight):
>>>
>>>// returns all if non common diagonals
>>>BitBoard bstyle_attack(square64 x, square64 y) {
>>> return inBetweenButOnlyDiagonals[x][y] & all;
>>>}
>>>
>>>BitBoard rstyle_attack(square64 x, square64 y) {
>>> return inBetweenButOnlyStraight[x][y] & all; // 32K lookup
>>>}
>>>
>>>Ok, a tiny loop looking on the board is not that bad and very resource friendly.
>>>But you have up to two conditions (target square reached, occupied)per run. And
>>>the number of runs depends on the distance (up to six runs) and whether you got
>>>an early break. In best case with two adjacent squares it is faster.
>>>
>>>So you may try it, it depends on so much "chaotical things" inside a concrete
>>>chess program and of course on the target CPU, cache size etc...
>>>
>>>Cheers,
>>>Gerd
>>
>>
>>I currently have a loop that checks for inbetween fields.
>>
>>// can_reach returns true if a piece on fromSq can reach targetSq,
>>// and no figure is blocking the path between fromSq and targetSq
>>bool can_reach(const position_t *ppos, int fromSq, int targetSq)
>>{
>> int way = targetSq - fromSq + 128;
>> int piece = fig_table[ppos->board[fromSq]->piece+6];
>>
>> if ((attackmove[way] & piece) == piece) // table lookup
>> {
>> fromSq += attackstep[way]; // table lookup for moving vector
>> while (fromSq != targetSq)
>> {
>> if (ppos->board[fromSq] != NULL)
>> return false; // path is blocked
>>
>> fromSq += attackstep[way];
>> }
>> return true;
>> }
>> return false;
>>}
>>
>>
>>Currently I get along with no bitboards at all. But maintaning one allPieces
>>bitboard in makemove would not be very costly.
>>I use the above function for generating attacks on the fly (in SEE), for inCheck
>>function to check if any opponent piece can reach the king, but the function
>>could even be used in genmove, generating recaptures etc.
>>
>>regards
>>Andy
>
>
>Yes, i even thought about that simple loop too.
>
>The while loop is really cheap, and may fit in 12 bytes or so.
>
>loop:
> cmp [regBoard + regX], regEmpty
> jnz false
>start:
> add regX, regInc
> cmp regX, regY
> jne loop
>true:
>...
>false:
>
>I guess P4's trace cache is very happy about that.
I don't know about P4 or AMD, but my PPC is happy with the function.
>Don't know exactly whether AMD64 prefetches the same 16 bytes all the time.
>If the branches are predicted correctly, foreward not taken and backward taken,
>it might be hell fast, due to board is most likely a L1-hit, only 1-4
>cachelines, always used. How expensive is the final miss prediction?
Well, actually I forgot about the inline in front of the bool... I usually even
type out the code, a for loop or while didn't make much difference for me. But
inlineing the code brought me a full 50% speedup for perft (which is dominated
by the inCheck() function in my case when profiling). For the incheck it would
look something like:
// if white to move
for (int i = 0; i < ppos->bfigs; i++) // cycle all black pieces
{
fromSq = ppos->blackplist[i].location;
way = kingSq - fromSq + 128;
piece = fig_table[ppos->board[fromSq]->piece+6];
if ((attackmove[way] & piece) == piece)
{
int cFrom = fromSq;
for (;;)
{
cFrom += attackstep[way];
if (cFrom == kingSq)
return true;
if (ppos->board[cFrom] != NULL)
break; // path is blocked
}
}
}
>But yes, most probably and i feel rather confident now for certain CPUs, you and
>Christophe Theron are right about this boolean function for non bitboard based
>programs. And it seems not to be the hottest spot to optimize...
>
>Cheers,
>Gerd
In contrast to perft, the inCheck function is much less important for normal
chess. And for SEE the most critical code is too skip SEE whenever possible. :)
regards
Andy
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.