Author: Gunther Piez
Date: 12:56:03 08/27/05
Go up one level in this thread
On August 27, 2005 at 04:34:03, Ed Schröder wrote: >On August 27, 2005 at 00:43:29, Tony Werten wrote: > >>On August 26, 2005 at 18:12:30, Ed Schröder wrote: >> >>>I am no longer up-to-date regarding the newest processors (such as the AMD-64) >>>and the internal working concerning speed, hence my question: >>> >>>Which (similar) code is faster? >>> >>> test byte ptr xxx,1 | test byte ptr xxx,1 >>> je label | mov AL,[ECX] >>> mov AL,[ECX] | je label >>> mov BL,[EDX] | mov BL,[EDX] >>> ... ........ | ... ........ >>> ... ........ | ... ........ >>>label: | label: >>> >>>Thanks in advance, >> >>Hi Ed, > >Hey Tony, > > >>probably not what you wanted to know, but the code is quite different from each >>other. >> >>If the jump condition is met 50% of the time, then the left code will execute >>the 2 moves 50% of the time for an average of 1 move per loop and the right side >>100%+50% is 1.5 moves per loop on average. >> >>Did you mean something else ? > >Yep :) > >The background of my question is the processor's capability to do 2 instructions >at the same time. Following this logic the code on the right (in principle) is >supposed to be faster. Actually, for quite some time, it's been three instructions per cycle, at least on Pentium Pro and above and on Athlon. Internal Parallelization of Instructions is better on Athlons, though. For Instructions Sequences with contain only a few Loads/Stores (not more than 1 every 3 insns average) it is quite often possible to do more than 2 insns per cycle. >> >>1 Depending on what you do with AL and BL, you might want to use the full >>registers by doing movzx eax,[ecx] and movzx ebx,[edx] (No penalty on new >>processors) > I second that. It's not only "movzx" being equally fast as "mov", you will get huge penalties if you try to read from a register (or memory) which has been partially written before. Search for "false dependency" in the amd64 manual. I recommend avoiding al,ah,ax... and so on completly :-) >I am aware of the cmove instruction but its use is very limited, For short conditional expressions depending on completly random data, doing the expression every time (before the jump) followed by a bunch of cmovs may be faster than a 50% mispredicted branch (about 10 cycles penalty on athlon, even more on p4) But I admit, most oft the time it isn't very useful. have there been >new extensions lately? 64 bit? SSE? :-)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.