Author: Gerd Isenberg
Date: 10:54:07 07/05/03
Go up one level in this thread
On July 05, 2003 at 13:48:21, Vincent Diepeveen wrote: >On July 05, 2003 at 12:22:29, Gerd Isenberg wrote: > >>On July 05, 2003 at 10:17:38, Omid David Tabibi wrote: >> >>>In Genesis I heavily use the abs() function, and so tried to optimize it. >>>Instead of using the abs() function defined in <math.h>, I wrote the following >>>fucntion: >>> >>>long abs(long x) { >>> long y; >>> y = x >> 31; >>> return (x ^ y) - y; >>>} >>> >>>Testing it using a profiler, I found out that my implementation is about twice >>>slower than the math.h implementation of abs(). I haven't looked at the >>>implementation in math.h, but I can't see how a more optimized version of abs() >>>can be written. >>> >>>Any ideas? >> >>I guess the x86 math.h implementation of abs() uses conditional mov intruction >>like this one (x in eax): >> >> mov edx, eax ; x >> neg eax ; -x >> cmp eax, edx ; x - (-x) >> cmovl eax, edx ; x < (-x) ? -x : x >> >>to compare your code in asm with x in eax: >> >> mov edx, eax ; x >> sar edx, 31 ; y = x >> 31 >> xor eax, edx ; x^y >> sub eax, edx ;(x^y)-y > >How is 32 bits shifting going to run fast at x86-64? seems to be fast: Software Optimization Guide for AMD Athlon™ 64 and AMD Opteron™ Processors Latency Note SAR mreg16/32/64, imm8 C1h 11-111-xxx DirectPath 1 3 3. The clock count, regardless of the number of shifts or rotates, as determined by CL or imm8. > >>hmm... i wouldn't expect that the your one is so much slower - interesting. >>May be like Vincent already mentioned the "slow" arithmetic shift instruction on >>P4 and more dependencies. The cmov approach also needs only two >>ALU-instructions (neg, cmp), whether your aproach needs three. >> >>Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.