Author: Gerd Isenberg
Date: 10:54:07 07/05/03
Go up one level in this thread
On July 05, 2003 at 13:48:21, Vincent Diepeveen wrote:
>On July 05, 2003 at 12:22:29, Gerd Isenberg wrote:
>
>>On July 05, 2003 at 10:17:38, Omid David Tabibi wrote:
>>
>>>In Genesis I heavily use the abs() function, and so tried to optimize it.
>>>Instead of using the abs() function defined in <math.h>, I wrote the following
>>>fucntion:
>>>
>>>long abs(long x) {
>>> long y;
>>> y = x >> 31;
>>> return (x ^ y) - y;
>>>}
>>>
>>>Testing it using a profiler, I found out that my implementation is about twice
>>>slower than the math.h implementation of abs(). I haven't looked at the
>>>implementation in math.h, but I can't see how a more optimized version of abs()
>>>can be written.
>>>
>>>Any ideas?
>>
>>I guess the x86 math.h implementation of abs() uses conditional mov intruction
>>like this one (x in eax):
>>
>> mov edx, eax ; x
>> neg eax ; -x
>> cmp eax, edx ; x - (-x)
>> cmovl eax, edx ; x < (-x) ? -x : x
>>
>>to compare your code in asm with x in eax:
>>
>> mov edx, eax ; x
>> sar edx, 31 ; y = x >> 31
>> xor eax, edx ; x^y
>> sub eax, edx ;(x^y)-y
>
>How is 32 bits shifting going to run fast at x86-64?
seems to be fast:
Software Optimization
Guide for AMD Athlon™ 64
and
AMD Opteron™ Processors
Latency Note
SAR mreg16/32/64, imm8 C1h 11-111-xxx DirectPath 1 3
3. The clock count, regardless of the number of shifts or rotates, as determined
by CL or imm8.
>
>>hmm... i wouldn't expect that the your one is so much slower - interesting.
>>May be like Vincent already mentioned the "slow" arithmetic shift instruction on
>>P4 and more dependencies. The cmov approach also needs only two
>>ALU-instructions (neg, cmp), whether your aproach needs three.
>>
>>Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.