Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitscan Updated

Author: Gerd Isenberg

Date: 04:37:31 02/16/03

Go up one level in this thread


On February 16, 2003 at 05:14:41, Matt Taylor wrote:

>On February 14, 2003 at 23:52:09, Matt Taylor wrote:
>
>>On February 14, 2003 at 14:31:42, Gerd Isenberg wrote:
>>
>>>I tried two of your routines so far, the cmov/single-bsf one and the
>>>LSB_32_table lookup one. It's a mess, both without any branches, but IsiChess is
>>>stil 2-4% slower with these. I guess some chaotic code size dependent side
>>>effects (Athlon Xp 2000+, 512MB SDRAM).
>>>
>>>Regards,
>>>Gerd
>
>What -did- you use before? After thinking about it, I don't see how these
>routines could make such an impact. It would be very difficult to shrink them
>further.
>
>Have you tried not inlining them? That would make the resulting code smaller.
>
>-Matt

Hi Matt,

this one (posted already x times) is the routine i compared with:

__forceinline
UINT BitSearchAndReset(BitBoard &bb)
{
	__asm
	{
		mov		ebx, [bb]
		xor		eax, eax
		mov		edx, 1
		bsf		ecx, [ebx]
		jnz		found
		bsf		ecx, [ebx + 4]
		lea		ebx, [ebx + 4]
		xor		eax, 32
	found:
		shl		edx, cl
		xor		eax, ecx
		xor		[ebx], edx
	}

}

To my surprise this conditional routine using four registers is faster than this
short unconditional one but using three vector path instructions:

__forceinline
UINT BitSearchAndReset(BitBoard &bb)
{
	__asm
	{
		mov		edx, [bb]
		bsf		eax, [edx+4]
		xor		eax, 32
		bsf		eax, [edx]
		btr		[edx],eax
	}
}

With none inlining, your routines and even walters magic lsb64 lookup routine
are all about the same in average, but the nps and solution time performance is
significantly worse (4-10%).

Regards,
Gerd





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.