Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Two questions: FirstOne() in Crafty and BSF/BSR

Author: Gerd Isenberg

Date: 08:47:35 08/08/02

Go up one level in this thread


On August 08, 2002 at 08:37:48, Gian-Carlo Pascutto wrote:

>On August 08, 2002 at 08:11:13, Vincent Diepeveen wrote:
>
>
>>>This is very short, but in tests with my Athlon XP, it is no faster than the
>>>version with branches and checks for zeroed bitmaps.
>>
>>that's of course a dumb test to do. you should try each time a random
>>chosen bitmap.
>
>I think you misread what he wrote. He didn't test with zero bitmaps.
>
>The problem of the short routine is that bsf is very slow on the Athlon,
>and it can only be handled by the vector decoder.
>
>In a bitboard program, typical case is to have an empty half-bitboard, and
>avoiding the bsf is faster even if it's occasionally mispredicted.
>
>--
>GCP

I can confim this and was very surprised, that this (introduced by Tim and
slighly modified) was significant faster in average ...

// precondition: bb not null
__forceinline unsigned int BitSearchAndReset(BitBoard &bb)
{
	__asm
	{
		xor		edx, edx
		mov		ebx, [bb]
		mov		eax, edx
		inc		edx

		bsf		ecx, [ebx]
		jnz		found

		bsf		ecx, [ebx + 4]
		lea		ebx, [ebx + 4]
		xor		eax, 32
	found:
		shl		edx, cl
		xor		eax, ecx
		xor		[ebx], edx
	}
}

... than this short one,  ...

// precondition: bb not null
__forceinline unsigned int BitSearchAndReset(BitBoard &bb)
{
	__asm
	{
		mov		edx, [bb]
		bsf		eax, [edx+4]
		xor		eax, 32
		bsf		eax, [edx]
		btr		[edx],eax
	}
}

...  even if it used 4 registers instead of two. Three vector path instructions
in a row seem to be real performance killers on athlon.

Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.