Author: Gerd Isenberg
Date: 13:15:01 07/18/03
Go up one level in this thread
On July 18, 2003 at 15:16:27, Tom Kerrigan wrote: >On July 18, 2003 at 04:05:52, Walter Faxon wrote: > >>>; 326 : if (bbHalf) bb0 = bb1; // will code as cmov (ideally) >>> >>> test ecx, ecx >>> je SHORT $L806 >>> mov eax, DWORD PTR _bb$[esp] >>>$L806: >>> >> >> >>Stupid compiler, not only no cmov > >IIRC, on the P6 (Pentium Pro, Pentium II, Pentium III), the cmov instruction >gets translated into a string of uOps that's equivalent to testing, branching, >and copying. > >In other words, there is no performance benefit (I believe there may actually be >a performance penalty) to using cmov on a P6, and it breaks compatibility with >pre-P6 processors, so it's little wonder the P6-era MS compiler doesn't generate >cmovs. > >-Tom Hi Tom, seems that AMD is faster here too, cmov is direct path instruction on K7: Athlon: CMOVE/CMOVZ reg16/32, reg16/32 0Fh 44h 11-xxx-xxx DirectPath 1 CMOVE/CMOVZ reg16/32, mem16/32 0Fh 44h mm-xxx-xxx DirectPath 4 Opteron: CMOVE/CMOVZ reg16/32/64, mem16/32/64 0Fh 44h mm-xxx-xxx DirectPath 4 CMOVE/CMOVZ reg16/32/64, reg16/32/64 0Fh 44h 11-xxx-xxx DirectPath 1 So here a cmov one, Walter suggested (i guess) for MSVC and Athlon (trying to outperforme the compiler and even bsf ;-) Cheers, Gerd int leastSigBit64(BitBoard bb ) { __asm { mov ecx, DWORD PTR [bb] xor eax, eax test ecx, ecx cmove ecx, DWORD PTR [bb+4] sete al lea edx, DWORD PTR [ecx-1] xor edx, ecx imul edx, 130329821 shl eax, 5 shr edx, 27 or al, [MT32table + edx] } } // and for P4 Matt's one without cmov or branch: int leastSigBit64(BitBoard bb) { bb ^= (bb - 1); unsigned int folded = ((int) bb) ^ ((int)(bb>>32)); return lsz64_tbl[folded * 0x78291ACF >> 26]; }
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.