Author: Gerd Isenberg
Date: 03:11:46 09/06/03
Go up one level in this thread
On September 05, 2003 at 10:29:47, Dezhi Zhao wrote: >MMX would be a clear win if it could go without the dirty emms instruction. >Here is the results without emms: > >#1 >old_key = 18be678400294823, new_key = 4512153f17260b03, c++ = 30s >old_key = 18be678400294823, new_key = 4512153f17260b03, asm = 26s >old_key = 18be678400294823, new_key = 4512153f17260b03, mmx = 19s >old_key = 18be678400294823, new_key = 4512153f17260b03, sse = 22s >old_key = 18be678400294823, new_key = 4512153f17260b03, sse2 = 23s > >#2 >old_key = 18be678400294823, new_key = 4512153f17260b03, c++ = 31s >old_key = 18be678400294823, new_key = 4512153f17260b03, asm = 25s >old_key = 18be678400294823, new_key = 4512153f17260b03, mmx = 19s >old_key = 18be678400294823, new_key = 4512153f17260b03, sse = 23s >old_key = 18be678400294823, new_key = 4512153f17260b03, sse2 = 23s > >#3 >old_key = 18be678400294823, new_key = 4512153f17260b03, c++ = 30s >old_key = 18be678400294823, new_key = 4512153f17260b03, asm = 26s >old_key = 18be678400294823, new_key = 4512153f17260b03, mmx = 19s >old_key = 18be678400294823, new_key = 4512153f17260b03, sse = 22s >old_key = 18be678400294823, new_key = 4512153f17260b03, sse2 = 23s > >If emms is encluded, the time goes up to 39s. > > >__declspec(naked) void __fastcall update_key_non_capture_mmx(int move) >{ > __asm > { > movzx eax, cl // from > movzx edx, ch // to > shr ecx, 10 // type * 64 > and ecx, ~63 // mask off > movq mm2, [old_key] // old_key 64 > > add eax, ecx // type from index > add edx, ecx // type to index > > movq mm0, type_rnd[eax*8] // from 64 > movq mm1, type_rnd[edx*8] // to 64 > pxor mm0, mm2 > pxor mm0, mm1 > > movq [new_key], mm0 // store 64 >// emms // performence killer > ret > } >} > > >Regards, > >Zhao Thanks for posting the MMX-result. In my current program i use mmx a lot without emms (or femms for athlon), because of no float or double arithmetic. I also use MFC, so that seems not to be a problem. Otherwise using emms before and after root search may be a solution. Btw. MOVQ is even a SSE2 instruction with XMM-regsiters. If target is XMM-register, the upper 64-bit are zero extented. On opterons SSE2, MOVQ is "double direct path", so i guess using SSE2-MOVQ is a bit slower, than MMX-MOVQ - maybe in this 22sec range. Regards, Gerd
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.