Author: Gerd Isenberg
Date: 10:29:56 07/06/03
Go up one level in this thread
On July 06, 2003 at 12:57:59, Dieter Buerssner wrote: >On July 06, 2003 at 05:02:50, Gerd Isenberg wrote: > > >>With mvc using math.h abs is fastest. With gcc cdq inline assembly abs or omids >>c-abs is much faster than the branching lib abs (maybe a macro from some header >>file?). > >Hi Gerd, as far as I can see, abs is no macro in my gcc environment. It wouldn't >be possible with Standard C methods, would it? Because you would not be allowed >to evaluate the argument twice. Of course, they could use compiler specific >extensions and/or inlining. I checked by precompiling the source. I think, Gcc >will detect abs() just like other functions (memcpy for example) and can inline >it directly. Ineeded I see the "simple_abs" method branch in the assembly. > >The strange thing, that omid_abs was significantly faster than nothing with MSVC >and rand(), do you have any idea? hmm, not really - may be because omids_abs is the only one which predicts the conditional loop jump correctly all the times ;-) What about unrolling the loop a bit, eg. repeat the body statement 2..10 times. Doubling the speed of a function by adding additional abs code - not bad ;-) Gerd Here the assembly of tfunc_omid_abs > >PUBLIC @tfunc_omid_abs@0 >; COMDAT @tfunc_omid_abs@0 >_TEXT SEGMENT >@tfunc_omid_abs@0 PROC NEAR ; COMDAT >; Line 61 > push esi > push edi > xor esi, esi > mov edi, 1000000000 ; 3b9aca00H >$L877: > call _rand > sub eax, 16384 ; 00004000H > mov ecx, eax > sar ecx, 31 ; 0000001fH > mov edx, ecx > xor edx, eax > sub edx, ecx > add esi, edx > dec edi > jne SHORT $L877 > pop edi > mov eax, esi > pop esi > ret 0 >@tfunc_omid_abs@0 ENDP > >Now for tfunc_nothing > >; COMDAT @tfunc_nothing@0 >_TEXT SEGMENT >@tfunc_nothing@0 PROC NEAR ; COMDAT >; Line 228 > push esi > push edi > xor esi, esi > mov edi, 1000000000 ; 3b9aca00H >$L969: > call _rand > dec edi > lea esi, DWORD PTR [esi+eax-16384] > jne SHORT $L969 > pop edi > mov eax, esi > pop esi > ret 0 >@tfunc_nothing@0 ENDP > >Looks about as tight as possible. The a += rand()-16384 with one lea. >But also shows, that with this method and clever inlining of the compiler, >things are not 100% comparable. > >And tfunc_abs (library): > >PUBLIC @tfunc_abs@0 >; COMDAT @tfunc_abs@0 >_TEXT SEGMENT >@tfunc_abs@0 PROC NEAR ; COMDAT >; Line 229 > push esi > push edi > xor esi, esi > mov edi, 1000000000 ; 3b9aca00H >$L978: > call _rand > sub eax, 16384 ; 00004000H > cdq > xor eax, edx > sub eax, edx > add esi, eax > dec edi > jne SHORT $L978 > pop edi > mov eax, esi > pop esi > ret 0 >@tfunc_abs@0 ENDP > >All very similar, all should use comparable time (the time of rand()), but >tfunc_omid_abs is double as fast! > >Does the P4 like aligned jump lables? Can they give such extreme effects? Hard >to believe. > >BTW. When I > >#define RAND_VAL() ((int)n) > >to get rid of the rand() overhead (and of course also giving the branch using >versions an advantage), I get normal results: > > nothing 4051657984 0.811 > abs 4051657984 1.702 > simple_abs 4051657984 1.923 > omid_abs 4051657984 1.702 > sbb_abs 4051657984 4.156 > cdq_abs 4051657984 4.457 > fish_abs 4051657984 2.063 > sar_abs 4051657984 3.324 > cmovl_abs 4051657984 2.604 > cmovs_abs 4051657984 2.644 > >405164798 = ((1e9 * (1e9+1))/2) % 2^^32; as expected for N_ITERATIONS=1e9. > >The 0.8 s for nothing is about 2 cycles, which seems reasonable for the loop > >$L977: > add eax, ecx > dec ecx > jne SHORT $L977 > >Regards, >Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.