Author: Gerd Isenberg
Date: 04:37:31 02/16/03
Go up one level in this thread
On February 16, 2003 at 05:14:41, Matt Taylor wrote:
>On February 14, 2003 at 23:52:09, Matt Taylor wrote:
>
>>On February 14, 2003 at 14:31:42, Gerd Isenberg wrote:
>>
>>>I tried two of your routines so far, the cmov/single-bsf one and the
>>>LSB_32_table lookup one. It's a mess, both without any branches, but IsiChess is
>>>stil 2-4% slower with these. I guess some chaotic code size dependent side
>>>effects (Athlon Xp 2000+, 512MB SDRAM).
>>>
>>>Regards,
>>>Gerd
>
>What -did- you use before? After thinking about it, I don't see how these
>routines could make such an impact. It would be very difficult to shrink them
>further.
>
>Have you tried not inlining them? That would make the resulting code smaller.
>
>-Matt
Hi Matt,
this one (posted already x times) is the routine i compared with:
__forceinline
UINT BitSearchAndReset(BitBoard &bb)
{
__asm
{
mov ebx, [bb]
xor eax, eax
mov edx, 1
bsf ecx, [ebx]
jnz found
bsf ecx, [ebx + 4]
lea ebx, [ebx + 4]
xor eax, 32
found:
shl edx, cl
xor eax, ecx
xor [ebx], edx
}
}
To my surprise this conditional routine using four registers is faster than this
short unconditional one but using three vector path instructions:
__forceinline
UINT BitSearchAndReset(BitBoard &bb)
{
__asm
{
mov edx, [bb]
bsf eax, [edx+4]
xor eax, 32
bsf eax, [edx]
btr [edx],eax
}
}
With none inlining, your routines and even walters magic lsb64 lookup routine
are all about the same in average, but the nps and solution time performance is
significantly worse (4-10%).
Regards,
Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.