Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Two questions: FirstOne() in Crafty and BSF/BSR

Author: Vincent Diepeveen

Date: 10:05:26 08/09/02

Go up one level in this thread


On August 08, 2002 at 11:47:35, Gerd Isenberg wrote:

This is really amazing to hear!

However we must take into account that in specint, inline assembly
is forbidden AFAIK. k7 is from before crafty area.

We will see what the future brings!

What i don't understand is why BSF is under the 'vector' instructions.
In my assembly manuals it's under something else.

What makes this a vector instruction?

>On August 08, 2002 at 08:37:48, Gian-Carlo Pascutto wrote:
>
>>On August 08, 2002 at 08:11:13, Vincent Diepeveen wrote:
>>
>>
>>>>This is very short, but in tests with my Athlon XP, it is no faster than the
>>>>version with branches and checks for zeroed bitmaps.
>>>
>>>that's of course a dumb test to do. you should try each time a random
>>>chosen bitmap.
>>
>>I think you misread what he wrote. He didn't test with zero bitmaps.
>>
>>The problem of the short routine is that bsf is very slow on the Athlon,
>>and it can only be handled by the vector decoder.
>>
>>In a bitboard program, typical case is to have an empty half-bitboard, and
>>avoiding the bsf is faster even if it's occasionally mispredicted.
>>
>>--
>>GCP
>
>I can confim this and was very surprised, that this (introduced by Tim and
>slighly modified) was significant faster in average ...
>
>// precondition: bb not null
>__forceinline unsigned int BitSearchAndReset(BitBoard &bb)
>{
>	__asm
>	{
>		xor		edx, edx
>		mov		ebx, [bb]
>		mov		eax, edx
>		inc		edx
>
>		bsf		ecx, [ebx]
>		jnz		found
>
>		bsf		ecx, [ebx + 4]
>		lea		ebx, [ebx + 4]
>		xor		eax, 32
>	found:
>		shl		edx, cl
>		xor		eax, ecx
>		xor		[ebx], edx
>	}
>}
>
>... than this short one,  ...
>
>// precondition: bb not null
>__forceinline unsigned int BitSearchAndReset(BitBoard &bb)
>{
>	__asm
>	{
>		mov		edx, [bb]
>		bsf		eax, [edx+4]
>		xor		eax, 32
>		bsf		eax, [edx]
>		btr		[edx],eax
>	}
>}
>
>...  even if it used 4 registers instead of two. Three vector path instructions
>in a row seem to be real performance killers on athlon.
>
>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.