Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Implementation of the abs() function [o.t.]

Author: Vincent Diepeveen

Date: 11:08:48 07/05/03

Go up one level in this thread


On July 05, 2003 at 13:54:07, Gerd Isenberg wrote:

>On July 05, 2003 at 13:48:21, Vincent Diepeveen wrote:
>
>>On July 05, 2003 at 12:22:29, Gerd Isenberg wrote:
>>
>>>On July 05, 2003 at 10:17:38, Omid David Tabibi wrote:
>>>
>>>>In Genesis I heavily use the abs() function, and so tried to optimize it.
>>>>Instead of using the abs() function defined in <math.h>, I wrote the following
>>>>fucntion:
>>>>
>>>>long abs(long x) {
>>>>    long y;
>>>>    y = x >> 31;
>>>>    return (x ^ y) - y;
>>>>}
>>>>
>>>>Testing it using a profiler, I found out that my implementation is about twice
>>>>slower than the math.h implementation of abs(). I haven't looked at the
>>>>implementation in math.h, but I can't see how a more optimized version of abs()
>>>>can be written.
>>>>
>>>>Any ideas?
>>>
>>>I guess the x86 math.h implementation of abs() uses conditional mov intruction
>>>like this one (x in eax):
>>>
>>>	mov   edx, eax    ; x
>>>	neg   eax         ; -x
>>>	cmp   eax, edx    ; x - (-x)
>>>	cmovl eax, edx    ; x < (-x) ? -x : x
>>>
>>>to compare your code in asm with x in eax:
>>>
>>>	mov   edx, eax    ; x
>>>	sar   edx, 31     ; y = x >> 31
>>>	xor   eax, edx    ; x^y
>>>	sub   eax, edx    ;(x^y)-y
>>
>>How is 32 bits shifting going to run fast at x86-64?
>
>seems to be fast:
>
>Software Optimization
>Guide for AMD Athlon™ 64
>and
>AMD Opteron™ Processors
>
>                                                 Latency  Note
>SAR mreg16/32/64, imm8 C1h 11-111-xxx DirectPath 1        3

Of course the instructions are fast. But i'm planning to use 16 GPRs and 64 bits
variables at it. So if you shift 0x739394abcde12345 then 31 bits to the right
you got yourself a major problem as you need to shift 63 then.

I wonder how the 32 bits assembly can be used anyway then. 64 bits x 16
registers is quite some faster than that. I will be using C code only.

Trivially i am against using 'long' anyway. Instead of 'long' i use 'int'.
At alpha 'long' is 64 bits, at x86 it is 32 bits. Int is 32 bits however at most
hardware and you don't run the risk that you need to put a 'L' at every compare
with a number.
  if( x == 5 )  // some compilers accept this. no matter what ansi says

  if( x == 5L ) // works correct

So using 'long' is a bad idea.

On the other hand i do use 'long long' for 64 bits of course in GCC and
hopefully in the future also at newer windows compilers when they follow the new
standards.

I can not possibly see a reason for a selfdefined abs implementation. Trying to
outsmart the compiler guys in assembly usually is a bad idea and a waste of time
unless you have a million dollar reason to do it (like at slow single chips for
medical applications).

>3. The clock count, regardless of the number of shifts or rotates, as determined
>by CL or imm8.

>>
>>>hmm... i wouldn't expect that the your one is so much slower - interesting.
>>>May be like Vincent already mentioned the "slow" arithmetic shift instruction on
>>>P4 and more dependencies. The cmov approach also needs only two
>>>ALU-instructions (neg, cmp), whether your aproach needs three.
>>>
>>>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.