Author: Gerd Isenberg
Date: 03:11:46 09/06/03
Go up one level in this thread
On September 05, 2003 at 10:29:47, Dezhi Zhao wrote:
>MMX would be a clear win if it could go without the dirty emms instruction.
>Here is the results without emms:
>
>#1
>old_key = 18be678400294823, new_key = 4512153f17260b03, c++ = 30s
>old_key = 18be678400294823, new_key = 4512153f17260b03, asm = 26s
>old_key = 18be678400294823, new_key = 4512153f17260b03, mmx = 19s
>old_key = 18be678400294823, new_key = 4512153f17260b03, sse = 22s
>old_key = 18be678400294823, new_key = 4512153f17260b03, sse2 = 23s
>
>#2
>old_key = 18be678400294823, new_key = 4512153f17260b03, c++ = 31s
>old_key = 18be678400294823, new_key = 4512153f17260b03, asm = 25s
>old_key = 18be678400294823, new_key = 4512153f17260b03, mmx = 19s
>old_key = 18be678400294823, new_key = 4512153f17260b03, sse = 23s
>old_key = 18be678400294823, new_key = 4512153f17260b03, sse2 = 23s
>
>#3
>old_key = 18be678400294823, new_key = 4512153f17260b03, c++ = 30s
>old_key = 18be678400294823, new_key = 4512153f17260b03, asm = 26s
>old_key = 18be678400294823, new_key = 4512153f17260b03, mmx = 19s
>old_key = 18be678400294823, new_key = 4512153f17260b03, sse = 22s
>old_key = 18be678400294823, new_key = 4512153f17260b03, sse2 = 23s
>
>If emms is encluded, the time goes up to 39s.
>
>
>__declspec(naked) void __fastcall update_key_non_capture_mmx(int move)
>{
> __asm
> {
> movzx eax, cl // from
> movzx edx, ch // to
> shr ecx, 10 // type * 64
> and ecx, ~63 // mask off
> movq mm2, [old_key] // old_key 64
>
> add eax, ecx // type from index
> add edx, ecx // type to index
>
> movq mm0, type_rnd[eax*8] // from 64
> movq mm1, type_rnd[edx*8] // to 64
> pxor mm0, mm2
> pxor mm0, mm1
>
> movq [new_key], mm0 // store 64
>// emms // performence killer
> ret
> }
>}
>
>
>Regards,
>
>Zhao
Thanks for posting the MMX-result. In my current program i use mmx a lot without
emms (or femms for athlon), because of no float or double arithmetic.
I also use MFC, so that seems not to be a problem. Otherwise using emms before
and after root search may be a solution.
Btw. MOVQ is even a SSE2 instruction with XMM-regsiters. If target is
XMM-register, the upper 64-bit are zero extented. On opterons SSE2, MOVQ is
"double direct path", so i guess using SSE2-MOVQ is a bit slower, than MMX-MOVQ
- maybe in this 22sec range.
Regards,
Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.