Author: Gerd Isenberg
Date: 13:53:40 12/19/05
Hi,
i am aware of getting a knight-distance of two squares is not that important -
but entertainment. I am aware of 0x88-difference to index a 15*16 table to get
distance, taxi-distance, absolute rank-file-difference or even the
knight-distance which should be conditionally modified for the one-square
diagonal edge cases, where the knight need four jumps from eg. a1 to b2 instead
of two from b2 to c3.
In my "original" routine the dense of information keeping two int[64]-arrays for
knight-distance 0..6 and boolean information of edges was too low for this rare
code. With the cost of a shift one both arrays are combined to one
amd64-cacheline.
The use of abs-intrinsic from math.h, performing an branchless abs like
(x ^ (x>>31))-(x>>31) with cdq-instruction, seems better to optimize by the
msvc6 compiler than an "handmade" abs.
By using only six or even three (file, rank) significant bits of 32 - instead of
making an explicite signmask for abs via mov,sar31 or cdq (works only with
eax,edx) - the bytewise signmask is implicitly in the upper byte register after
a 32-bit subtraction with eax,ebx,ecx,edx. Also there is a kind of simd with
one 32-bit xor.
No way to force my compiler to produce such code with char etc. other than
inline assembly. While it is usually a bad idea to work with partial
byte-registers - specially targeting high-byte register before using the whole
32-bits - it works surprisingly well here.
The C-version is 81 byte and takes 16 cycles, while the inline asm is exactly
one amd64 cacheline and takes 11 cycles, both including call/return. Since the
int-variables a,b,c are all used to index a char[64]-array, isn't that
information appropriate to guide the compiler about reduced value range
optimizations?
Cheers,
Gerd
#include <math.h>
const unsigned char knedge[64] = {
// 2*knight-distance + edge
2*0+1, 2*3+0, 2*2+0, 2*3+0, 2*2+0, 2*3+0, 2*4+0, 2*5+1,
2*3+0, 2*2+0, 2*1+0, 2*2+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0,
2*2+0, 2*1+0, 2*4+0, 2*3+0, 2*2+0, 2*3+0, 2*4+0, 2*5+0,
2*3+0, 2*2+0, 2*3+0, 2*2+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0,
2*2+0, 2*3+0, 2*2+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0, 2*5+0,
2*3+0, 2*4+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0, 2*5+0, 2*4+0,
2*4+0, 2*3+0, 2*4+0, 2*3+0, 2*4+0, 2*5+0, 2*4+0, 2*5+0,
2*5+1, 2*4+0, 2*5+0, 2*4+0, 2*5+0, 2*4+0, 2*5+0, 2*6+1
};
__forceinline
unsigned int absRankFileDiff(unsigned int a, unsigned int b) {
return abs((a|7)-(b|7)) ^ abs((a&7)-(b&7));
}
// precondition a,b < 64
unsigned int knightDistance(unsigned int a, unsigned int b) {
unsigned int c = absRankFileDiff(a,b);
return 2*((knedge[a]^knedge[b]) & (c==9)) + (knedge[c]>>1);
}
// precondition a,b < 64
unsigned int __fastcall _knightDistance(unsigned int a, unsigned int b) {
#ifdef _DEBUG
__asm {
mov ecx, [a]
mov edx, [b]
}
#endif
__asm {
mov eax,ecx
mov ebx,edx
and ecx,7
and edx,7
sub edx,ecx ; (a&7)-(b&7)
mov cl,knedge[ebx]
or ebx,7
xor cl,knedge[eax]
or eax,7
sub eax,ebx ; ((a|7)-(b|7))
; use ah,dh as {0,-1}-signmask for abs
add dl,dh ; abs(x) := (x+signmask)^signmask
add al,ah ; for both differences
xor eax,edx ; kind of SIMD
xor al,ah ; c = abs((a|7)-(b|7)) ^ abs((a&7)-(b&7));
cmp al,9
mov bl,al ; high bytes of ebx still zero
movzx eax,knedge[ebx]
sete bl ; c==9
shr eax,1 ; distance >> 1
and ebx,ecx ; edge[a]^edge[b] & (c==9)
lea eax,[2*ebx+eax]
}
}
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.