Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: LSB and POPCOUNT optimization

Author: Gerd Isenberg

Date: 10:47:04 06/21/02

Go up one level in this thread


On June 20, 2002 at 17:10:46, Tim Foden wrote:

>I have tested this routine (find bit and remove it) in Green Light, but it
>appears to be slower than the current routine I am using.  I guess this must be
>because the BSF and BTR instructions are slow on my Athlon XP 1.46Gz.  It may
>also be because, for instance, when generating moves for white, most of the
>pieces will be in the low DWORD, and most of the moves will be made to the low
>DWORD, so I think that in general my routine will correctly predict the branch,
>and will take the short route through the code.
>
...snip
>
>Cheers, Tim.
>
>BTW, here is the code from my current routine :)
>
>inline Square	BitScanForwardRemove( UINT64& bitmap )
>{
>	__asm
>	{
>		mov		ebx, [bitmap]
>		mov		edx, 1
>
>		bsf		ecx, [ebx]
>		mov		eax, 0
>		jnz		found
>
>		bsf		ecx, [ebx + 4]
>		mov		eax, 32
>		lea		ebx, [ebx + 4]
>		jnz		found
>
>		mov		eax, XX
>		jmp		done
>
>	found:
>		shl		edx, cl
>		add		eax, ecx
>		xor		[ebx], edx
>
>	done:
>	}
>
>	// value returned in eax
>}
>
>
>Cheers, Tim.

Thanks Tim for sharing your code. I tried it and it's definitely faster (at
least on AMD1,4GHZ). I only have first impressions, and have to do some
profiling. In one testpositions so far a have a solution time of 102 sec instead
of 115 sec (may be influenced by some other chaotic optimizing or whatever
effects). Even if the number of used registers is larger in your inline routine
and therefore it may more difficult to hold vars of the calling routine in
registers, it's faster - very interesting.

It seems that this vector path instructions (bsf, btr) are very inefficient on
Athlons.

Cheers,
Gerd




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.