Author: Tim Foden
Date: 15:49:11 06/21/02
Go up one level in this thread
On June 21, 2002 at 13:47:04, Gerd Isenberg wrote:
>On June 20, 2002 at 17:10:46, Tim Foden wrote:
>
>Thanks Tim for sharing your code. I tried it and it's definitely faster (at
>least on AMD1,4GHZ). I only have first impressions, and have to do some
>profiling. In one testpositions so far a have a solution time of 102 sec instead
>of 115 sec (may be influenced by some other chaotic optimizing or whatever
>effects). Even if the number of used registers is larger in your inline routine
>and therefore it may more difficult to hold vars of the calling routine in
>registers, it's faster - very interesting.
>
>It seems that this vector path instructions (bsf, btr) are very inefficient on
>Athlons.
It's nice to know I'm not going mad :) I keep trying different versions of
this algorithm, but I always find that the one with branches is fastest!
BTW, here is the fastest one I have that doesn't have branches. It is only a
little slower. It is only faster in cases where all the pieces are on the
opposite side of the board than normal. It still uses 4 registers too, I'm
afraid. Also, this one doesn't cope with empty bitmaps.
inline Square BitScanForwardRemove( UINT64& bitmap )
{
ASSERT( bitmap );
__asm
{
mov ebx, [bitmap]
mov eax, 0
cmp dword ptr [ebx], eax
mov edx, 1
setz al
bsf ecx, [ebx + eax * 4]
lea ebx, [ebx + eax * 4]
shl eax, 5
shl edx, cl
add eax, ecx
xor [ebx], edx
}
// value returned in eax
}
Cheers, Tim.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.