Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: cmov isn't necessarily good

Author: Gerd Isenberg

Date: 13:15:01 07/18/03

On July 18, 2003 at 15:16:27, Tom Kerrigan wrote:

>On July 18, 2003 at 04:05:52, Walter Faxon wrote:
>
>>>; 326  :     if (bbHalf) bb0 = bb1;              // will code as cmov (ideally)
>>>
>>>	test	ecx, ecx
>>>	je	SHORT $L806
>>>	mov	eax, DWORD PTR _bb$[esp]
>>>$L806:
>>>
>>
>>
>>Stupid compiler, not only no cmov
>
>IIRC, on the P6 (Pentium Pro, Pentium II, Pentium III), the cmov instruction
>gets translated into a string of uOps that's equivalent to testing, branching,
>and copying.
>
>In other words, there is no performance benefit (I believe there may actually be
>a performance penalty) to using cmov on a P6, and it breaks compatibility with
>pre-P6 processors, so it's little wonder the P6-era MS compiler doesn't generate
>cmovs.
>
>-Tom

Hi Tom,

seems that AMD is faster here too, cmov is direct path instruction on K7:

Athlon:

CMOVE/CMOVZ reg16/32, reg16/32 0Fh 44h 11-xxx-xxx DirectPath 1
CMOVE/CMOVZ reg16/32, mem16/32 0Fh 44h mm-xxx-xxx DirectPath 4

Opteron:
CMOVE/CMOVZ reg16/32/64, mem16/32/64 0Fh 44h mm-xxx-xxx DirectPath 4
CMOVE/CMOVZ reg16/32/64, reg16/32/64 0Fh 44h 11-xxx-xxx DirectPath 1

So here a cmov one, Walter suggested (i guess) for MSVC and Athlon
(trying to outperforme the compiler and even bsf ;-)

Cheers,
Gerd

int leastSigBit64(BitBoard bb )
{
	__asm
	{
		mov	ecx, DWORD PTR [bb]
		xor	eax, eax
		test	ecx, ecx
		cmove	ecx, DWORD PTR [bb+4]
		sete	al
		lea	edx, DWORD PTR [ecx-1]
		xor	edx, ecx
		imul	edx, 130329821
		shl	eax, 5
		shr	edx, 27
		or	al, [MT32table + edx]
	}
}

// and for P4 Matt's one without cmov or branch:
int leastSigBit64(BitBoard bb)
{
	bb ^= (bb - 1);
	unsigned int folded = ((int) bb) ^ ((int)(bb>>32));
	return lsz64_tbl[folded * 0x78291ACF >> 26];
}

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.