Author: Gerd Isenberg
Date: 12:06:16 07/05/03
Go up one level in this thread
On July 05, 2003 at 14:08:48, Vincent Diepeveen wrote: >On July 05, 2003 at 13:54:07, Gerd Isenberg wrote: > >>On July 05, 2003 at 13:48:21, Vincent Diepeveen wrote: >> >>>On July 05, 2003 at 12:22:29, Gerd Isenberg wrote: >>> >>>>On July 05, 2003 at 10:17:38, Omid David Tabibi wrote: >>>> >>>>>In Genesis I heavily use the abs() function, and so tried to optimize it. >>>>>Instead of using the abs() function defined in <math.h>, I wrote the following >>>>>fucntion: >>>>> >>>>>long abs(long x) { >>>>> long y; >>>>> y = x >> 31; >>>>> return (x ^ y) - y; >>>>>} >>>>> >>>>>Testing it using a profiler, I found out that my implementation is about twice >>>>>slower than the math.h implementation of abs(). I haven't looked at the >>>>>implementation in math.h, but I can't see how a more optimized version of abs() >>>>>can be written. >>>>> >>>>>Any ideas? >>>> >>>>I guess the x86 math.h implementation of abs() uses conditional mov intruction >>>>like this one (x in eax): >>>> >>>> mov edx, eax ; x >>>> neg eax ; -x >>>> cmp eax, edx ; x - (-x) >>>> cmovl eax, edx ; x < (-x) ? -x : x >>>> >>>>to compare your code in asm with x in eax: >>>> >>>> mov edx, eax ; x >>>> sar edx, 31 ; y = x >> 31 >>>> xor eax, edx ; x^y >>>> sub eax, edx ;(x^y)-y >>> >>>How is 32 bits shifting going to run fast at x86-64? >> >>seems to be fast: >> >>Software Optimization >>Guide for AMD Athlon™ 64 >>and >>AMD Opteron™ Processors >> >> Latency Note >>SAR mreg16/32/64, imm8 C1h 11-111-xxx DirectPath 1 3 > >Of course the instructions are fast. But i'm planning to use 16 GPRs and 64 bits >variables at it. So if you shift 0x739394abcde12345 then 31 bits to the right >you got yourself a major problem as you need to shift 63 then. Regardless of the number of shifts or rotates! > >I wonder how the 32 bits assembly can be used anyway then. 64 bits x 16 >registers is quite some faster than that. I will be using C code only. > >Trivially i am against using 'long' anyway. Instead of 'long' i use 'int'. >At alpha 'long' is 64 bits, at x86 it is 32 bits. Int is 32 bits however at most >hardware and you don't run the risk that you need to put a 'L' at every compare >with a number. > if( x == 5 ) // some compilers accept this. no matter what ansi says > > if( x == 5L ) // works correct > >So using 'long' is a bad idea. > >On the other hand i do use 'long long' for 64 bits of course in GCC and >hopefully in the future also at newer windows compilers when they follow the new >standards. > >I can not possibly see a reason for a selfdefined abs implementation. Trying to >outsmart the compiler guys in assembly usually is a bad idea and a waste of time >unless you have a million dollar reason to do it (like at slow single chips for >medical applications). > Yes, unless there is some abs macro with a conditional branch. >>3. The clock count, regardless of the number of shifts or rotates, as determined >>by CL or imm8. > >>> >>>>hmm... i wouldn't expect that the your one is so much slower - interesting. >>>>May be like Vincent already mentioned the "slow" arithmetic shift instruction on >>>>P4 and more dependencies. The cmov approach also needs only two >>>>ALU-instructions (neg, cmp), whether your aproach needs three. >>>> >>>>Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.