Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: sliding attacks in 3 lines

Author: Gerd Isenberg

Date: 14:47:09 04/16/04

Go up one level in this thread


On April 16, 2004 at 17:14:14, Andreas Guettinger wrote:

>On April 16, 2004 at 15:51:25, Gerd Isenberg wrote:
>
>>On April 16, 2004 at 06:14:25, Andreas Guettinger wrote:
>>
>>>Could somebody recap how the final macros look like, all inlined?
>>>What bitboards are needed to use the macros?
>>>Is the overhead of maintaining the bitmaps worth it?
>>>Is it a speedup to 0x88? Do you think its faster than a for loop checking for
>>>blocked squares between x and y?
>>>
>>>regards
>>>Andy
>>
>>Hi Andy,
>>
>>actually i would suggest something like this, bitboard return, to additionally
>>check whether the popcount is zero or one, probably looking for xrays, pins and
>>remove checkers.
>>
>>inline
>>BitBoard brq_path_clear(x88sqr gt, x88sqr ls, U64 BitBoard) {
>>   return (isqBit(gt) - 2*isqBit(ls)) & path & all;
>>}
>>
>>BitBoard bstyle_attack(x88sqr x, x88sqr y) {
>>   int dir = direction0x88Diff[0x88+x-y]); // 256 sized lookup
>>   if ( dir & isDiagonalDirection )
>>   {
>>      BitBoard path = pathOfDirection[dir];
>>      if ( x > y ) return brq_path_clear(x, y, path);
>>      return brq_path_clear(y, x, path);
>>   }
>>   return (BitBoard) -1;
>>}
>>
>>Same for rooks, excecpt looking for straight rays.
>>
>>This is cache friendly, has two conditional jumps and some computations.
>>
>>Probably a 64*64 inbetween lookup with x and y is faster, but pollutes the cache
>>32KByte (+4/8/16Kbyte) a bit more.
>>One should try, saves some instructions and missed branches.
>>
>>BitBoard bstyle_attack(square64 x, square64 y) {
>>   int dir = direction[x][y]; // 4K char(short/int lookup
>>   if ( dir & isDiagonalDirection )
>>      return inBetween[x][y] & all; // 32K lookup
>>   return (BitBoard) -1;
>>}
>>
>>Or even faster and complete branchless with two proper initialized 32KByte
>>arrays (contains -1 for non common squares for diagonal and straight):
>>
>>// returns all if non common diagonals
>>BitBoard bstyle_attack(square64 x, square64 y) {
>>  return inBetweenButOnlyDiagonals[x][y] & all;
>>}
>>
>>BitBoard rstyle_attack(square64 x, square64 y) {
>>  return inBetweenButOnlyStraight[x][y] & all; // 32K lookup
>>}
>>
>>Ok, a tiny loop looking on the board is not that bad and very resource friendly.
>>But you have up to two conditions (target square reached, occupied)per run. And
>>the number of runs depends on the distance (up to six runs) and whether you got
>>an early break. In best case with two adjacent squares it is faster.
>>
>>So you may try it, it depends on so much "chaotical things" inside a concrete
>>chess program and of course on the target CPU, cache size etc...
>>
>>Cheers,
>>Gerd
>
>
>I currently have a loop that checks for inbetween fields.
>
>// can_reach returns true if a piece on fromSq can reach targetSq,
>// and no figure is blocking the path between fromSq and targetSq
>bool can_reach(const position_t *ppos, int fromSq, int targetSq)
>{
>	int way = targetSq - fromSq + 128;
>	int piece = fig_table[ppos->board[fromSq]->piece+6];
>
>	if ((attackmove[way] & piece) == piece)  // table lookup
>	{
>		fromSq += attackstep[way];   // table lookup for moving vector
>		while (fromSq != targetSq)
>		{
>			if (ppos->board[fromSq] != NULL)
>				return false;		// path is blocked
>
>			fromSq += attackstep[way];
>		}
>		return true;
>	}
>	return false;
>}
>
>
>Currently I get along with no bitboards at all. But maintaning one allPieces
>bitboard in makemove would not be very costly.
>I use the above function for generating attacks on the fly (in SEE), for inCheck
>function to check if any opponent piece can reach the king, but the function
>could even be used in genmove, generating recaptures etc.
>
>regards
>Andy


Yes, i even thought about that simple loop too.

The while loop is really cheap, and may fit in 12 bytes or so.

loop:
  cmp  [regBoard + regX], regEmpty
  jnz  false
start:
  add  regX, regInc
  cmp  regX, regY
  jne  loop
true:
...
false:

I guess P4's trace cache is very happy about that.
Don't know exactly whether AMD64 prefetches the same 16 bytes all the time.
If the branches are predicted correctly, foreward not taken and backward taken,
it might be hell fast, due to board is most likely a L1-hit, only 1-4
cachelines, always used. How expensive is the final miss prediction?

But yes, most probably and i feel rather confident now for certain CPUs, you and
Christophe Theron are right about this boolean function for non bitboard based
programs. And it seems not to be the hottest spot to optimize...

Cheers,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.