Computer Chess Club Archives


Search

Terms

Messages

Subject: optimization on reduced int range - e.g. knight-distance

Author: Gerd Isenberg

Date: 13:53:40 12/19/05


Hi,

i am aware of getting a knight-distance of two squares is not that important -
but entertainment. I am aware of 0x88-difference to index a 15*16 table to get
distance, taxi-distance, absolute rank-file-difference or even the
knight-distance which should be conditionally modified for the one-square
diagonal edge cases, where the knight need four jumps from eg. a1 to b2 instead
of two from b2 to c3.

In my "original" routine the dense of information keeping two int[64]-arrays for
knight-distance 0..6 and boolean information of edges was too low for this rare
code. With the cost of a shift one both arrays are combined to one
amd64-cacheline.

The use of abs-intrinsic from math.h, performing an branchless abs like
(x ^ (x>>31))-(x>>31) with cdq-instruction, seems better to optimize by the
msvc6 compiler than an "handmade" abs.

By using only six or even three (file, rank) significant bits of 32 - instead of
making an explicite signmask for abs via mov,sar31 or cdq (works only with
eax,edx) - the bytewise signmask is implicitly in the upper byte register after
a 32-bit subtraction with eax,ebx,ecx,edx.  Also there is a kind of simd with
one 32-bit xor.

No way to force my compiler to produce such code with char etc. other than
inline assembly. While it is usually a bad idea to work with partial
byte-registers - specially targeting high-byte register before using the whole
32-bits - it works surprisingly well here.

The C-version is 81 byte and takes 16 cycles, while the inline asm is exactly
one amd64 cacheline and takes 11 cycles, both including call/return. Since the
int-variables a,b,c are all used to index a char[64]-array, isn't that
information appropriate to guide the compiler about reduced value range
optimizations?

Cheers,
Gerd


#include <math.h>

const unsigned char knedge[64] = {
  // 2*knight-distance + edge
  2*0+1, 2*3+0, 2*2+0, 2*3+0, 2*2+0, 2*3+0, 2*4+0, 2*5+1,
  2*3+0, 2*2+0, 2*1+0, 2*2+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0,
  2*2+0, 2*1+0, 2*4+0, 2*3+0, 2*2+0, 2*3+0, 2*4+0, 2*5+0,
  2*3+0, 2*2+0, 2*3+0, 2*2+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0,
  2*2+0, 2*3+0, 2*2+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0, 2*5+0,
  2*3+0, 2*4+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0, 2*5+0, 2*4+0,
  2*4+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0, 2*5+0, 2*4+0, 2*5+0,
  2*5+1, 2*4+0, 2*5+0, 2*4+0, 2*5+0, 2*4+0, 2*5+0, 2*6+1
};

__forceinline
unsigned int absRankFileDiff(unsigned int a, unsigned int b) {
  return abs((a|7)-(b|7)) ^ abs((a&7)-(b&7));
}

// precondition a,b < 64
unsigned int knightDistance(unsigned  int a, unsigned int b) {
  unsigned int c = absRankFileDiff(a,b);
  return 2*((knedge[a]^knedge[b]) & (c==9)) + (knedge[c]>>1);
}


// precondition a,b < 64
unsigned int __fastcall _knightDistance(unsigned int a, unsigned int b) {
#ifdef _DEBUG
  __asm {
    mov ecx, [a]
    mov edx, [b]
  }
#endif
  __asm {
    mov eax,ecx
    mov ebx,edx
    and ecx,7
    and edx,7
    sub edx,ecx ; (a&7)-(b&7)
    mov cl,knedge[ebx]
    or ebx,7
    xor cl,knedge[eax]
    or eax,7
    sub eax,ebx ; ((a|7)-(b|7))
    ; use ah,dh as {0,-1}-signmask for abs
    add dl,dh   ; abs(x) := (x+signmask)^signmask
    add al,ah   ; for both differences
    xor eax,edx ; kind of SIMD
    xor al,ah   ; c = abs((a|7)-(b|7)) ^ abs((a&7)-(b&7));
    cmp al,9
    mov bl,al   ; high bytes of ebx still zero
    movzx eax,knedge[ebx]
    sete bl     ; c==9
    shr eax,1   ; distance >> 1
    and ebx,ecx ; edge[a]^edge[b] & (c==9)
    lea eax,[2*ebx+eax]
  }
}



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.