Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: LSB and POPCOUNT optimization

Author: Tim Foden

Date: 15:49:11 06/21/02

On June 21, 2002 at 13:47:04, Gerd Isenberg wrote:

>On June 20, 2002 at 17:10:46, Tim Foden wrote:
>
>Thanks Tim for sharing your code. I tried it and it's definitely faster (at
>least on AMD1,4GHZ). I only have first impressions, and have to do some
>profiling. In one testpositions so far a have a solution time of 102 sec instead
>of 115 sec (may be influenced by some other chaotic optimizing or whatever
>effects). Even if the number of used registers is larger in your inline routine
>and therefore it may more difficult to hold vars of the calling routine in
>registers, it's faster - very interesting.
>
>It seems that this vector path instructions (bsf, btr) are very inefficient on
>Athlons.

It's nice to know I'm not going mad  :)  I keep trying different versions of
this algorithm, but I always find that the one with branches is fastest!

BTW, here is the fastest one I have that doesn't have branches.  It is only a
little slower.  It is only faster in cases where all the pieces are on the
opposite side of the board than normal.  It still uses 4 registers too, I'm
afraid.  Also, this one doesn't cope with empty bitmaps.

inline Square	BitScanForwardRemove( UINT64& bitmap )
{
	ASSERT( bitmap );
	__asm
	{
		mov		ebx, [bitmap]
		mov		eax, 0
		cmp		dword ptr [ebx], eax
		mov		edx, 1
		setz	al
		bsf		ecx, [ebx + eax * 4]
		lea		ebx, [ebx + eax * 4]
		shl		eax, 5
		shl		edx, cl
		add		eax, ecx
		xor		[ebx], edx
	}

	// value returned in eax
}

Cheers, Tim.

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.