Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Nalimov: bsf/bsr intrinsics implementation still not optimal

Author: Gerd Isenberg

Date: 10:38:48 09/23/04

Go up one level in this thread


<snip>
>>An additional jnc after bsf/bsr is not so expensive, if always not taken,
>>specially compared to the huge latency of bsf/bsr. But it leaves the option to
>>build the traverse loop without an explicit != zero test.
>>
>
>That's exactly why you should do the test before bit scan. Should the return
>value be used as you described,  the prototype would be blamed encouraging
>sub-optimal programming:)

Ok, but only because of the huge latency of bsf, even if mask is zero - and the
ability of ms-compiler, to interprete the zero flag already after clearing the
found bit with mask ^= m1. I guess not all compiler are able to do that and some
generate an explicit test.

Otherwise one could argue, that using an explicte zero test is a waste of time,
because bsf does the test anyway ;-)


int testbsf(int mask)
{
	int clone = 0;

	if (mask)
	{   next:
		int index = _BitScanForward(mask);
		int m1 = 1 << index;
		clone |= m1;
		if ( mask ^= m1 )
                   goto next;
	}
	return clone;
}

versus

int testbsf(int mask)
{
	int clone = 0;
        int index

	while ( (index = _BitScanForward(mask)) >= 0 )
	{
		int m1 = 1 << index;
		mask  ^= m1;
		clone |= m1;
	}
	return clone;
}

with something like

fastcall
int _BitScanForward(unsigned int mask)
{
	__asm
	{
		bsf		eax,ecx
		jnz		bsfend ; zero flag not carry as i said before!
		mov		eax,-1
	bsfend:
	}
}

Cheers,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.