Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Expert Assembler Question

Author: Gunther Piez
Date: 12:56:03 08/27/05
On August 27, 2005 at 04:34:03, Ed Schröder wrote:

>On August 27, 2005 at 00:43:29, Tony Werten wrote:
>
>>On August 26, 2005 at 18:12:30, Ed Schröder wrote:
>>
>>>I am no longer up-to-date regarding the newest processors (such as the AMD-64)
>>>and the internal working concerning speed, hence my question:
>>>
>>>Which (similar) code is faster?
>>>
>>>       test    byte ptr xxx,1    |        test    byte ptr xxx,1
>>>       je      label             |        mov     AL,[ECX]
>>>       mov     AL,[ECX]          |        je      label
>>>       mov     BL,[EDX]          |        mov     BL,[EDX]
>>>       ...     ........          |        ...     ........
>>>       ...     ........          |        ...     ........
>>>label:                           | label:
>>>
>>>Thanks in advance,
>>
>>Hi Ed,
>
>Hey Tony,
>
>
>>probably not what you wanted to know, but the code is quite different from each
>>other.
>>
>>If the jump condition is met 50% of the time, then the left code will execute
>>the 2 moves 50% of the time for an average of 1 move per loop and the right side
>>100%+50% is 1.5 moves per loop on average.
>>
>>Did you mean something else ?
>
>Yep :)
>
>The background of my question is the processor's capability to do 2 instructions
>at the same time. Following this logic the code on the right (in principle) is
>supposed to be faster.

Actually, for quite some time, it's been three instructions per cycle, at least
on Pentium Pro and above and on Athlon. Internal Parallelization of Instructions
is better on Athlons, though. For Instructions Sequences with contain only a few
Loads/Stores (not more than 1 every 3 insns average) it is quite often possible
to do more than 2 insns per cycle.
>>
>>1 Depending on what you do with AL and BL, you might want to use the full
>>registers by doing movzx eax,[ecx] and movzx ebx,[edx] (No penalty on new
>>processors)
>
I second that. It's not only "movzx" being equally fast as "mov", you will get
huge penalties if you try to read from a register (or memory) which has been
partially written before. Search for "false dependency" in the amd64 manual. I
recommend avoiding al,ah,ax... and so on completly :-)

>I am aware of the cmove instruction but its use is very limited,
For short conditional expressions depending on completly random data, doing the
expression every time (before the jump) followed by a bunch of cmovs may be
faster than a 50% mispredicted branch (about 10 cycles penalty on athlon, even
more on p4)
But I admit, most oft the time it isn't very useful.

have there been
>new extensions lately?

64 bit?
SSE?

:-)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.