Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Implementation of the abs() function [o.t.]

Author: Gerd Isenberg

Date: 12:06:16 07/05/03

Go up one level in this thread


On July 05, 2003 at 14:08:48, Vincent Diepeveen wrote:

>On July 05, 2003 at 13:54:07, Gerd Isenberg wrote:
>
>>On July 05, 2003 at 13:48:21, Vincent Diepeveen wrote:
>>
>>>On July 05, 2003 at 12:22:29, Gerd Isenberg wrote:
>>>
>>>>On July 05, 2003 at 10:17:38, Omid David Tabibi wrote:
>>>>
>>>>>In Genesis I heavily use the abs() function, and so tried to optimize it.
>>>>>Instead of using the abs() function defined in <math.h>, I wrote the following
>>>>>fucntion:
>>>>>
>>>>>long abs(long x) {
>>>>>    long y;
>>>>>    y = x >> 31;
>>>>>    return (x ^ y) - y;
>>>>>}
>>>>>
>>>>>Testing it using a profiler, I found out that my implementation is about twice
>>>>>slower than the math.h implementation of abs(). I haven't looked at the
>>>>>implementation in math.h, but I can't see how a more optimized version of abs()
>>>>>can be written.
>>>>>
>>>>>Any ideas?
>>>>
>>>>I guess the x86 math.h implementation of abs() uses conditional mov intruction
>>>>like this one (x in eax):
>>>>
>>>>	mov   edx, eax    ; x
>>>>	neg   eax         ; -x
>>>>	cmp   eax, edx    ; x - (-x)
>>>>	cmovl eax, edx    ; x < (-x) ? -x : x
>>>>
>>>>to compare your code in asm with x in eax:
>>>>
>>>>	mov   edx, eax    ; x
>>>>	sar   edx, 31     ; y = x >> 31
>>>>	xor   eax, edx    ; x^y
>>>>	sub   eax, edx    ;(x^y)-y
>>>
>>>How is 32 bits shifting going to run fast at x86-64?
>>
>>seems to be fast:
>>
>>Software Optimization
>>Guide for AMD Athlon™ 64
>>and
>>AMD Opteron™ Processors
>>
>>                                                 Latency  Note
>>SAR mreg16/32/64, imm8 C1h 11-111-xxx DirectPath 1        3
>
>Of course the instructions are fast. But i'm planning to use 16 GPRs and 64 bits
>variables at it. So if you shift 0x739394abcde12345 then 31 bits to the right
>you got yourself a major problem as you need to shift 63 then.

Regardless of the number of shifts or rotates!


>
>I wonder how the 32 bits assembly can be used anyway then. 64 bits x 16
>registers is quite some faster than that. I will be using C code only.
>
>Trivially i am against using 'long' anyway. Instead of 'long' i use 'int'.
>At alpha 'long' is 64 bits, at x86 it is 32 bits. Int is 32 bits however at most
>hardware and you don't run the risk that you need to put a 'L' at every compare
>with a number.
>  if( x == 5 )  // some compilers accept this. no matter what ansi says
>
>  if( x == 5L ) // works correct
>
>So using 'long' is a bad idea.
>
>On the other hand i do use 'long long' for 64 bits of course in GCC and
>hopefully in the future also at newer windows compilers when they follow the new
>standards.
>
>I can not possibly see a reason for a selfdefined abs implementation. Trying to
>outsmart the compiler guys in assembly usually is a bad idea and a waste of time
>unless you have a million dollar reason to do it (like at slow single chips for
>medical applications).
>

Yes, unless there is some abs macro with a conditional branch.


>>3. The clock count, regardless of the number of shifts or rotates, as determined
>>by CL or imm8.
>
>>>
>>>>hmm... i wouldn't expect that the your one is so much slower - interesting.
>>>>May be like Vincent already mentioned the "slow" arithmetic shift instruction on
>>>>P4 and more dependencies. The cmov approach also needs only two
>>>>ALU-instructions (neg, cmp), whether your aproach needs three.
>>>>
>>>>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.