Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Bitboard assembly question

Author: Matt Taylor

Date: 12:33:25 02/26/03

Go up one level in this thread


On February 26, 2003 at 09:09:42, José Carlos wrote:

>  Hi all,
>  there're quite some implementations of FirstOne and LastOne out there, but I
>enjoy trying to figure out things myself. I've forgotten most of my assembly
>lessons though.
>  Last night I came up with the following function:
>
>INLINE UINT32 LeftMostOne(UINT64 bb)
>{
>	__asm
>	{
>		bsr edx,dword ptr [bb+4]	// left side (a8-h5)
>		jz RightSide
>		mov eax,31
>		jmp Substract
>	  RightSide:
>		bsr edx,dword ptr [bb]
>		mov eax,63
>	  Substract:
>		sub eax,edx
>	}
>}
>
>  Note that my board representation is MSB(bit 63) = a8; bit 62 = b8; ...; (bit
>1) = g1; LSB(bit 0) = h1
>  The function calculates the letmost (closest to a8) bit.
>  I'd like to know what inneficient things I'm doing in the above function and
>why.
>
>  Thanks in advance,
>
>  José C.

It's good code for an older chip, but modern processors like to avoid branches.
Here's how I would rewrite your code to save a branch:

		bsr edx,dword ptr [bb+4]	// left side (a8-h5)
		mov eax,31
		jnz Subtract

		bsr edx,dword ptr [bb]
		mov eax,63

	  Subtract:
		sub eax,edx

Note that the mov instruction does not update the flags, so this works out
alright. You can also write a branchless version using another register and
cmovcc:

		xor	eax,eax
		bsr	ecx,dword ptr [bb]
		bsr	edx,dword ptr [bb+4]	// left side (a8-h5)
		cmovnz	edx,ecx
		setnz	al
		sub	edx,31	// backward so edx needs to be inverted
		shl	eax,5
		sub	eax,edx	// eax = 0 or 32, this fixes all

Just a note on the xor isntruction at the top, this helps Athlon, P6 (Pentium
Pro/2/3), and the Pentium 4. If not used, you can incur partial register stalls.
Also, you wouldn't get what you wanted -- the upper 24-bits of the eax register
probably aren't 0, so you would get bad results at the end.

The only processor that efficiently handles bsr is the P6. For others like
Pentium 4, K6, or Athlon, you'll want to avoid the second bsr. Since you only
use the right side when the left side is 0, you can substitute a cmp to set the
flags and then use either adc/sbb or setcc to stick the result in a register.
You can then use the AGU with complex addressing to extract the appropriate
half. Here's an example:

		xor	eax,eax
		test	dword ptr [bb+4],-1
		setnz	al
		bsr	edx,dword ptr [bb+eax*4]
		shl	eax,5
		sub	edx,31	// backward so edx needs to be inverted
		sub	eax,edx	// eax = 0 or 32, this fixes all

-or-

		cmp	dword ptr [bb+4], 0
		sbb	eax,eax	// eax = -1 or 0
		bsr	edx,dword ptr [bb+eax*4+4]
		and	eax,32
		sub	edx,31	// backward so edx needs to be inverted
		sub	eax,edx	// eax = 0 or 32, this fixes all

Both segments of code take 14 cycles on Athlon. Also, both of these segments of
code will execute correctly on a 386 unlike the cmovcc version that I gave
above.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.