Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Expert Assembler Question

Author: Gerd Isenberg

Date: 02:36:51 08/27/05

Go up one level in this thread


On August 27, 2005 at 04:34:03, Ed Schröder wrote:

>On August 27, 2005 at 00:43:29, Tony Werten wrote:
>
>>On August 26, 2005 at 18:12:30, Ed Schröder wrote:
>>
>>>I am no longer up-to-date regarding the newest processors (such as the AMD-64)
>>>and the internal working concerning speed, hence my question:
>>>
>>>Which (similar) code is faster?
>>>
>>>       test    byte ptr xxx,1    |        test    byte ptr xxx,1
>>>       je      label             |        mov     AL,[ECX]
>>>       mov     AL,[ECX]          |        je      label
>>>       mov     BL,[EDX]          |        mov     BL,[EDX]
>>>       ...     ........          |        ...     ........
>>>       ...     ........          |        ...     ........
>>>label:                           | label:
>>>
>>>Thanks in advance,
>>
>>Hi Ed,
>
>Hey Tony,
>
>
>>probably not what you wanted to know, but the code is quite different from each
>>other.
>>
>>If the jump condition is met 50% of the time, then the left code will execute
>>the 2 moves 50% of the time for an average of 1 move per loop and the right side
>>100%+50% is 1.5 moves per loop on average.
>>
>>Did you mean something else ?
>
>Yep :)
>
>The background of my question is the processor's capability to do 2 instructions
>at the same time. Following this logic the code on the right (in principle) is
>supposed to be faster.
>
>
>
>>2 BTW's:
>>
>>1 Depending on what you do with AL and BL, you might want to use the full
>>registers by doing movzx eax,[ecx] and movzx ebx,[edx] (No penalty on new
>>processors)
>
>That's good to know, thank you.


Yes, reading bytes to partial registers is expensive anyway, 4 cycles latency.

MOV reg8, mem8 8Ah mm-xxx-xxx DirectPath 4
MOV AL, mem8   A0h            DirectPath 4

Tony is right - zero extending to ax,eax,rax is also 4 cycles.

MOVZX reg16/32/64, mem8 0Fh B6h mm-xxx-xxx DirectPath 4

If you have some "global", very often used array[eg. 64], it might be worth to
waste some memory (eg. four cachelines instead of one) and switch to native
32-bit int size:

MOV reg16/32/64, mreg16/32/64 8Bh 11-xxx-xxx DirectPath 1

Also, avoid the shorter but redundant EAX-Move encoding:

MOV AX/EAX/RAX, mem16/32/64 A1h DirectPath 4/3/3

Gerd

>
>
>>2 This kind of code might be helped a lot with conditional moves. (All new
>>processors support that) It basicly does a "if (cond) move eax,xx" without a
>>branch, so without the risk if branchmispredictions (very, very expensive on new
>>processors).
>
>I am aware of the cmove instruction but its use is very limited, have there been
>new extensions lately?
>
>Thanks,
>
>Ed



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.