Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: MMX results

Author: Gerd Isenberg

Date: 03:11:46 09/06/03

Go up one level in this thread


On September 05, 2003 at 10:29:47, Dezhi Zhao wrote:

>MMX would be a clear win if it could go without the dirty emms instruction.
>Here is the results without emms:
>
>#1
>old_key = 18be678400294823, new_key = 4512153f17260b03,  c++ = 30s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  asm = 26s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  mmx = 19s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  sse = 22s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  sse2 = 23s
>
>#2
>old_key = 18be678400294823, new_key = 4512153f17260b03,  c++ = 31s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  asm = 25s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  mmx = 19s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  sse = 23s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  sse2 = 23s
>
>#3
>old_key = 18be678400294823, new_key = 4512153f17260b03,  c++ = 30s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  asm = 26s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  mmx = 19s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  sse = 22s
>old_key = 18be678400294823, new_key = 4512153f17260b03,  sse2 = 23s
>
>If emms is encluded, the time goes up to 39s.
>
>
>__declspec(naked) void __fastcall update_key_non_capture_mmx(int move)
>{
>	__asm
>	{
>		movzx	eax, cl			// from
>		movzx	edx, ch			// to
>		shr	ecx, 10			// type * 64
>		and	ecx, ~63		// mask off
>		movq	mm2, [old_key]		// old_key 64
>
>		add	eax, ecx		// type from index
>		add edx, ecx			// type to index
>
>		movq	mm0, type_rnd[eax*8]	// from 64
>		movq	mm1, type_rnd[edx*8]	// to 64
>		pxor	mm0, mm2
>		pxor	mm0, mm1
>
>		movq	[new_key], mm0		// store 64
>//		emms				// performence killer
>		ret
>	}
>}
>
>
>Regards,
>
>Zhao

Thanks for posting the MMX-result. In my current program i use mmx a lot without
emms (or femms for athlon), because of no float or double arithmetic.
I also use MFC, so that seems not to be a problem. Otherwise using emms before
and after root search may be a solution.

Btw. MOVQ is even a SSE2 instruction with XMM-regsiters. If target is
XMM-register, the upper 64-bit are zero extented. On opterons SSE2, MOVQ is
"double direct path", so i guess using SSE2-MOVQ is a bit slower, than MMX-MOVQ
- maybe in this 22sec range.

Regards,
Gerd



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.