Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Implementation of the abs() function [o.t.]

Author: Gerd Isenberg

Date: 13:15:57 07/05/03

Go up one level in this thread


On July 05, 2003 at 15:47:41, Dieter Buerssner wrote:

>On July 05, 2003 at 12:22:29, Gerd Isenberg wrote:
>
>>On July 05, 2003 at 10:17:38, Omid David Tabibi wrote:
>>
>>>In Genesis I heavily use the abs() function, and so tried to optimize it.
>>>Instead of using the abs() function defined in <math.h>, I wrote the following
>>>fucntion:
>>>
>>>long abs(long x) {
>>>    long y;
>>>    y = x >> 31;
>>>    return (x ^ y) - y;
>>>}
>>>
>>>Testing it using a profiler, I found out that my implementation is about twice
>>>slower than the math.h implementation of abs(). I haven't looked at the
>>>implementation in math.h, but I can't see how a more optimized version of abs()
>>>can be written.
>>>
>>>Any ideas?
>>
>>I guess the x86 math.h implementation of abs() uses conditional mov intruction
>>like this one (x in eax):
>
>It wouldn't run on Pentium then?
>
>>	mov   edx, eax    ; x
>>	neg   eax         ; -x
>>	cmp   eax, edx    ; x - (-x)
>>	cmovl eax, edx    ; x < (-x) ? -x : x
>>
>>to compare your code in asm with x in eax:
>>
>>	mov   edx, eax    ; x
>>	sar   edx, 31     ; y = x >> 31
>
>Or just (instead of the 2 instructions):
>        cdq               ; edx:eax = x (eax) sign extended
>
>But it leaves the compiler no choices in registers, while your code
>would run with any register pair.
>
>>	xor   eax, edx    ; x^y
>>	sub   eax, edx    ;(x^y)-y
>>
>>hmm... i wouldn't expect that the your one is so much slower - interesting.
>>May be like Vincent already mentioned the "slow" arithmetic shift instruction on
>>P4 and more dependencies. The cmov approach also needs only two
>>ALU-instructions (neg, cmp), whether your aproach needs three.
>>
>>Gerd
>
>Regards,
>Dieter

Hi Dieter,

you are right, after inspecting the assembler output of the abs intrinsic:

00408398 99                   cdq
00408399 33 C2                xor         eax,edx
0040839B 2B C2                sub         eax,edx

only 5 bytes for abs, wow!

Compiler possibly don't "understand" the semantic of "y = x >> 31" and to
translate it into one cdq instruction.

Regards,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.