Author: Omid David Tabibi
Date: 11:00:03 07/06/03
Go up one level in this thread
On July 06, 2003 at 13:29:56, Gerd Isenberg wrote: >On July 06, 2003 at 12:57:59, Dieter Buerssner wrote: > >>On July 06, 2003 at 05:02:50, Gerd Isenberg wrote: >> >> >>>With mvc using math.h abs is fastest. With gcc cdq inline assembly abs or omids >>>c-abs is much faster than the branching lib abs (maybe a macro from some header >>>file?). >> >>Hi Gerd, as far as I can see, abs is no macro in my gcc environment. It wouldn't >>be possible with Standard C methods, would it? Because you would not be allowed >>to evaluate the argument twice. Of course, they could use compiler specific >>extensions and/or inlining. I checked by precompiling the source. I think, Gcc >>will detect abs() just like other functions (memcpy for example) and can inline >>it directly. Ineeded I see the "simple_abs" method branch in the assembly. >> >>The strange thing, that omid_abs was significantly faster than nothing with MSVC >>and rand(), do you have any idea? > >hmm, not really - may be because omids_abs is the only one which predicts the >conditional loop jump correctly all the times ;-) That's what I thought; but apparently 'sar' costs more than the branch I tried to evade. > >What about unrolling the loop a bit, eg. repeat the body statement 2..10 times. >Doubling the speed of a function by adding additional abs code - not bad ;-) > >Gerd > > >Here the assembly of tfunc_omid_abs >> >>PUBLIC @tfunc_omid_abs@0 >>; COMDAT @tfunc_omid_abs@0 >>_TEXT SEGMENT >>@tfunc_omid_abs@0 PROC NEAR ; COMDAT >>; Line 61 >> push esi >> push edi >> xor esi, esi >> mov edi, 1000000000 ; 3b9aca00H >>$L877: >> call _rand >> sub eax, 16384 ; 00004000H >> mov ecx, eax >> sar ecx, 31 ; 0000001fH >> mov edx, ecx >> xor edx, eax >> sub edx, ecx >> add esi, edx >> dec edi >> jne SHORT $L877 >> pop edi >> mov eax, esi >> pop esi >> ret 0 >>@tfunc_omid_abs@0 ENDP >> >>Now for tfunc_nothing >> >>; COMDAT @tfunc_nothing@0 >>_TEXT SEGMENT >>@tfunc_nothing@0 PROC NEAR ; COMDAT >>; Line 228 >> push esi >> push edi >> xor esi, esi >> mov edi, 1000000000 ; 3b9aca00H >>$L969: >> call _rand >> dec edi >> lea esi, DWORD PTR [esi+eax-16384] >> jne SHORT $L969 >> pop edi >> mov eax, esi >> pop esi >> ret 0 >>@tfunc_nothing@0 ENDP >> >>Looks about as tight as possible. The a += rand()-16384 with one lea. >>But also shows, that with this method and clever inlining of the compiler, >>things are not 100% comparable. >> >>And tfunc_abs (library): >> >>PUBLIC @tfunc_abs@0 >>; COMDAT @tfunc_abs@0 >>_TEXT SEGMENT >>@tfunc_abs@0 PROC NEAR ; COMDAT >>; Line 229 >> push esi >> push edi >> xor esi, esi >> mov edi, 1000000000 ; 3b9aca00H >>$L978: >> call _rand >> sub eax, 16384 ; 00004000H >> cdq >> xor eax, edx >> sub eax, edx >> add esi, eax >> dec edi >> jne SHORT $L978 >> pop edi >> mov eax, esi >> pop esi >> ret 0 >>@tfunc_abs@0 ENDP >> >>All very similar, all should use comparable time (the time of rand()), but >>tfunc_omid_abs is double as fast! >> >>Does the P4 like aligned jump lables? Can they give such extreme effects? Hard >>to believe. >> >>BTW. When I >> >>#define RAND_VAL() ((int)n) >> >>to get rid of the rand() overhead (and of course also giving the branch using >>versions an advantage), I get normal results: >> >> nothing 4051657984 0.811 >> abs 4051657984 1.702 >> simple_abs 4051657984 1.923 >> omid_abs 4051657984 1.702 >> sbb_abs 4051657984 4.156 >> cdq_abs 4051657984 4.457 >> fish_abs 4051657984 2.063 >> sar_abs 4051657984 3.324 >> cmovl_abs 4051657984 2.604 >> cmovs_abs 4051657984 2.644 >> >>405164798 = ((1e9 * (1e9+1))/2) % 2^^32; as expected for N_ITERATIONS=1e9. >> >>The 0.8 s for nothing is about 2 cycles, which seems reasonable for the loop >> >>$L977: >> add eax, ecx >> dec ecx >> jne SHORT $L977 >> >>Regards, >>Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.