Author: Gerd Isenberg
Date: 02:36:51 08/27/05
Go up one level in this thread
On August 27, 2005 at 04:34:03, Ed Schröder wrote: >On August 27, 2005 at 00:43:29, Tony Werten wrote: > >>On August 26, 2005 at 18:12:30, Ed Schröder wrote: >> >>>I am no longer up-to-date regarding the newest processors (such as the AMD-64) >>>and the internal working concerning speed, hence my question: >>> >>>Which (similar) code is faster? >>> >>> test byte ptr xxx,1 | test byte ptr xxx,1 >>> je label | mov AL,[ECX] >>> mov AL,[ECX] | je label >>> mov BL,[EDX] | mov BL,[EDX] >>> ... ........ | ... ........ >>> ... ........ | ... ........ >>>label: | label: >>> >>>Thanks in advance, >> >>Hi Ed, > >Hey Tony, > > >>probably not what you wanted to know, but the code is quite different from each >>other. >> >>If the jump condition is met 50% of the time, then the left code will execute >>the 2 moves 50% of the time for an average of 1 move per loop and the right side >>100%+50% is 1.5 moves per loop on average. >> >>Did you mean something else ? > >Yep :) > >The background of my question is the processor's capability to do 2 instructions >at the same time. Following this logic the code on the right (in principle) is >supposed to be faster. > > > >>2 BTW's: >> >>1 Depending on what you do with AL and BL, you might want to use the full >>registers by doing movzx eax,[ecx] and movzx ebx,[edx] (No penalty on new >>processors) > >That's good to know, thank you. Yes, reading bytes to partial registers is expensive anyway, 4 cycles latency. MOV reg8, mem8 8Ah mm-xxx-xxx DirectPath 4 MOV AL, mem8 A0h DirectPath 4 Tony is right - zero extending to ax,eax,rax is also 4 cycles. MOVZX reg16/32/64, mem8 0Fh B6h mm-xxx-xxx DirectPath 4 If you have some "global", very often used array[eg. 64], it might be worth to waste some memory (eg. four cachelines instead of one) and switch to native 32-bit int size: MOV reg16/32/64, mreg16/32/64 8Bh 11-xxx-xxx DirectPath 1 Also, avoid the shorter but redundant EAX-Move encoding: MOV AX/EAX/RAX, mem16/32/64 A1h DirectPath 4/3/3 Gerd > > >>2 This kind of code might be helped a lot with conditional moves. (All new >>processors support that) It basicly does a "if (cond) move eax,xx" without a >>branch, so without the risk if branchmispredictions (very, very expensive on new >>processors). > >I am aware of the cmove instruction but its use is very limited, have there been >new extensions lately? > >Thanks, > >Ed
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.