Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Implementation of the abs() function [o.t.]

Author: Gerd Isenberg

Date: 10:29:56 07/06/03

Go up one level in this thread


On July 06, 2003 at 12:57:59, Dieter Buerssner wrote:

>On July 06, 2003 at 05:02:50, Gerd Isenberg wrote:
>
>
>>With mvc using math.h abs is fastest. With gcc cdq inline assembly abs or omids
>>c-abs is much faster than the branching lib abs (maybe a macro from some header
>>file?).
>
>Hi Gerd, as far as I can see, abs is no macro in my gcc environment. It wouldn't
>be possible with Standard C methods, would it? Because you would not be allowed
>to evaluate the argument twice. Of course, they could use compiler specific
>extensions and/or inlining. I checked by precompiling the source. I think, Gcc
>will detect abs() just like other functions (memcpy for example) and can inline
>it directly. Ineeded I see the "simple_abs" method branch in the assembly.
>
>The strange thing, that omid_abs was significantly faster than nothing with MSVC
>and rand(), do you have any idea?

hmm, not really - may be because omids_abs is the only one which predicts the
conditional loop jump correctly all the times ;-)

What about unrolling the loop a bit, eg. repeat the body statement 2..10 times.
Doubling the speed of a function by adding additional abs code - not bad ;-)

Gerd


Here the assembly of tfunc_omid_abs
>
>PUBLIC  @tfunc_omid_abs@0
>;       COMDAT @tfunc_omid_abs@0
>_TEXT   SEGMENT
>@tfunc_omid_abs@0 PROC NEAR                             ; COMDAT
>; Line 61
>        push    esi
>        push    edi
>        xor     esi, esi
>        mov     edi, 1000000000                         ; 3b9aca00H
>$L877:
>        call    _rand
>        sub     eax, 16384                              ; 00004000H
>        mov     ecx, eax
>        sar     ecx, 31                                 ; 0000001fH
>        mov     edx, ecx
>        xor     edx, eax
>        sub     edx, ecx
>        add     esi, edx
>        dec     edi
>        jne     SHORT $L877
>        pop     edi
>        mov     eax, esi
>        pop     esi
>        ret     0
>@tfunc_omid_abs@0 ENDP
>
>Now for tfunc_nothing
>
>;       COMDAT @tfunc_nothing@0
>_TEXT   SEGMENT
>@tfunc_nothing@0 PROC NEAR                              ; COMDAT
>; Line 228
>        push    esi
>        push    edi
>        xor     esi, esi
>        mov     edi, 1000000000                         ; 3b9aca00H
>$L969:
>        call    _rand
>        dec     edi
>        lea     esi, DWORD PTR [esi+eax-16384]
>        jne     SHORT $L969
>        pop     edi
>        mov     eax, esi
>        pop     esi
>        ret     0
>@tfunc_nothing@0 ENDP
>
>Looks about as tight as possible. The a += rand()-16384 with one lea.
>But also shows, that with this method and clever inlining of the compiler,
>things are not 100% comparable.
>
>And tfunc_abs (library):
>
>PUBLIC  @tfunc_abs@0
>;       COMDAT @tfunc_abs@0
>_TEXT   SEGMENT
>@tfunc_abs@0 PROC NEAR                                  ; COMDAT
>; Line 229
>        push    esi
>        push    edi
>        xor     esi, esi
>        mov     edi, 1000000000                         ; 3b9aca00H
>$L978:
>        call    _rand
>        sub     eax, 16384                              ; 00004000H
>        cdq
>        xor     eax, edx
>        sub     eax, edx
>        add     esi, eax
>        dec     edi
>        jne     SHORT $L978
>        pop     edi
>        mov     eax, esi
>        pop     esi
>        ret     0
>@tfunc_abs@0 ENDP
>
>All very similar, all should use comparable time (the time of rand()), but
>tfunc_omid_abs is double as fast!
>
>Does the P4 like aligned jump lables? Can they give such extreme effects? Hard
>to believe.
>
>BTW. When I
>
>#define RAND_VAL() ((int)n)
>
>to get rid of the rand() overhead (and of course also giving the branch using
>versions an advantage), I get normal results:
>
>       nothing 4051657984 0.811
>           abs 4051657984 1.702
>    simple_abs 4051657984 1.923
>      omid_abs 4051657984 1.702
>       sbb_abs 4051657984 4.156
>       cdq_abs 4051657984 4.457
>      fish_abs 4051657984 2.063
>       sar_abs 4051657984 3.324
>     cmovl_abs 4051657984 2.604
>     cmovs_abs 4051657984 2.644
>
>405164798 = ((1e9 * (1e9+1))/2) % 2^^32; as expected for N_ITERATIONS=1e9.
>
>The 0.8 s for nothing is about 2 cycles, which seems reasonable for the loop
>
>$L977:
>        add     eax, ecx
>        dec     ecx
>        jne     SHORT $L977
>
>Regards,
>Dieter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.