Author: Dieter Buerssner
Date: 09:57:59 07/06/03
Go up one level in this thread
On July 06, 2003 at 05:02:50, Gerd Isenberg wrote: >With mvc using math.h abs is fastest. With gcc cdq inline assembly abs or omids >c-abs is much faster than the branching lib abs (maybe a macro from some header >file?). Hi Gerd, as far as I can see, abs is no macro in my gcc environment. It wouldn't be possible with Standard C methods, would it? Because you would not be allowed to evaluate the argument twice. Of course, they could use compiler specific extensions and/or inlining. I checked by precompiling the source. I think, Gcc will detect abs() just like other functions (memcpy for example) and can inline it directly. Ineeded I see the "simple_abs" method branch in the assembly. The strange thing, that omid_abs was significantly faster than nothing with MSVC and rand(), do you have any idea? Here the assembly of tfunc_omid_abs PUBLIC @tfunc_omid_abs@0 ; COMDAT @tfunc_omid_abs@0 _TEXT SEGMENT @tfunc_omid_abs@0 PROC NEAR ; COMDAT ; Line 61 push esi push edi xor esi, esi mov edi, 1000000000 ; 3b9aca00H $L877: call _rand sub eax, 16384 ; 00004000H mov ecx, eax sar ecx, 31 ; 0000001fH mov edx, ecx xor edx, eax sub edx, ecx add esi, edx dec edi jne SHORT $L877 pop edi mov eax, esi pop esi ret 0 @tfunc_omid_abs@0 ENDP Now for tfunc_nothing ; COMDAT @tfunc_nothing@0 _TEXT SEGMENT @tfunc_nothing@0 PROC NEAR ; COMDAT ; Line 228 push esi push edi xor esi, esi mov edi, 1000000000 ; 3b9aca00H $L969: call _rand dec edi lea esi, DWORD PTR [esi+eax-16384] jne SHORT $L969 pop edi mov eax, esi pop esi ret 0 @tfunc_nothing@0 ENDP Looks about as tight as possible. The a += rand()-16384 with one lea. But also shows, that with this method and clever inlining of the compiler, things are not 100% comparable. And tfunc_abs (library): PUBLIC @tfunc_abs@0 ; COMDAT @tfunc_abs@0 _TEXT SEGMENT @tfunc_abs@0 PROC NEAR ; COMDAT ; Line 229 push esi push edi xor esi, esi mov edi, 1000000000 ; 3b9aca00H $L978: call _rand sub eax, 16384 ; 00004000H cdq xor eax, edx sub eax, edx add esi, eax dec edi jne SHORT $L978 pop edi mov eax, esi pop esi ret 0 @tfunc_abs@0 ENDP All very similar, all should use comparable time (the time of rand()), but tfunc_omid_abs is double as fast! Does the P4 like aligned jump lables? Can they give such extreme effects? Hard to believe. BTW. When I #define RAND_VAL() ((int)n) to get rid of the rand() overhead (and of course also giving the branch using versions an advantage), I get normal results: nothing 4051657984 0.811 abs 4051657984 1.702 simple_abs 4051657984 1.923 omid_abs 4051657984 1.702 sbb_abs 4051657984 4.156 cdq_abs 4051657984 4.457 fish_abs 4051657984 2.063 sar_abs 4051657984 3.324 cmovl_abs 4051657984 2.604 cmovs_abs 4051657984 2.644 405164798 = ((1e9 * (1e9+1))/2) % 2^^32; as expected for N_ITERATIONS=1e9. The 0.8 s for nothing is about 2 cycles, which seems reasonable for the loop $L977: add eax, ecx dec ecx jne SHORT $L977 Regards, Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.