Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Implementation of the abs() function [o.t.]

Author: Gerd Isenberg

Date: 11:12:34 07/06/03

Go up one level in this thread


On July 06, 2003 at 14:00:03, Omid David Tabibi wrote:

>On July 06, 2003 at 13:29:56, Gerd Isenberg wrote:
>
>>On July 06, 2003 at 12:57:59, Dieter Buerssner wrote:
>>
>>>On July 06, 2003 at 05:02:50, Gerd Isenberg wrote:
>>>
>>>
>>>>With mvc using math.h abs is fastest. With gcc cdq inline assembly abs or omids
>>>>c-abs is much faster than the branching lib abs (maybe a macro from some header
>>>>file?).
>>>
>>>Hi Gerd, as far as I can see, abs is no macro in my gcc environment. It wouldn't
>>>be possible with Standard C methods, would it? Because you would not be allowed
>>>to evaluate the argument twice. Of course, they could use compiler specific
>>>extensions and/or inlining. I checked by precompiling the source. I think, Gcc
>>>will detect abs() just like other functions (memcpy for example) and can inline
>>>it directly. Ineeded I see the "simple_abs" method branch in the assembly.
>>>
>>>The strange thing, that omid_abs was significantly faster than nothing with MSVC
>>>and rand(), do you have any idea?
>>
>>hmm, not really - may be because omids_abs is the only one which predicts the
>>conditional loop jump correctly all the times ;-)
>
>That's what I thought; but apparently 'sar' costs more than the branch I tried
>to evade.
>


maybe the sar latency is the reason that a out of order preexecuted dec edi
outcome is predicted correcty. Doesn't VTune report branch misspredictions?


>
>>
>>What about unrolling the loop a bit, eg. repeat the body statement 2..10 times.
>>Doubling the speed of a function by adding additional abs code - not bad ;-)
>>
>>Gerd
>>
>>
>>Here the assembly of tfunc_omid_abs
>>>
>>>PUBLIC  @tfunc_omid_abs@0
>>>;       COMDAT @tfunc_omid_abs@0
>>>_TEXT   SEGMENT
>>>@tfunc_omid_abs@0 PROC NEAR                             ; COMDAT
>>>; Line 61
>>>        push    esi
>>>        push    edi
>>>        xor     esi, esi
>>>        mov     edi, 1000000000                         ; 3b9aca00H
>>>$L877:
>>>        call    _rand
>>>        sub     eax, 16384                              ; 00004000H
>>>        mov     ecx, eax
>>>        sar     ecx, 31                                 ; 0000001fH
>>>        mov     edx, ecx
>>>        xor     edx, eax
>>>        sub     edx, ecx
>>>        add     esi, edx
>>>        dec     edi
>>>        jne     SHORT $L877
>>>        pop     edi
>>>        mov     eax, esi
>>>        pop     esi
>>>        ret     0
>>>@tfunc_omid_abs@0 ENDP
>>>
>>>Now for tfunc_nothing
>>>
>>>;       COMDAT @tfunc_nothing@0
>>>_TEXT   SEGMENT
>>>@tfunc_nothing@0 PROC NEAR                              ; COMDAT
>>>; Line 228
>>>        push    esi
>>>        push    edi
>>>        xor     esi, esi
>>>        mov     edi, 1000000000                         ; 3b9aca00H
>>>$L969:
>>>        call    _rand
>>>        dec     edi
>>>        lea     esi, DWORD PTR [esi+eax-16384]
>>>        jne     SHORT $L969
>>>        pop     edi
>>>        mov     eax, esi
>>>        pop     esi
>>>        ret     0
>>>@tfunc_nothing@0 ENDP
>>>
>>>Looks about as tight as possible. The a += rand()-16384 with one lea.
>>>But also shows, that with this method and clever inlining of the compiler,
>>>things are not 100% comparable.
>>>
>>>And tfunc_abs (library):
>>>
>>>PUBLIC  @tfunc_abs@0
>>>;       COMDAT @tfunc_abs@0
>>>_TEXT   SEGMENT
>>>@tfunc_abs@0 PROC NEAR                                  ; COMDAT
>>>; Line 229
>>>        push    esi
>>>        push    edi
>>>        xor     esi, esi
>>>        mov     edi, 1000000000                         ; 3b9aca00H
>>>$L978:
>>>        call    _rand
>>>        sub     eax, 16384                              ; 00004000H
>>>        cdq
>>>        xor     eax, edx
>>>        sub     eax, edx
>>>        add     esi, eax
>>>        dec     edi
>>>        jne     SHORT $L978
>>>        pop     edi
>>>        mov     eax, esi
>>>        pop     esi
>>>        ret     0
>>>@tfunc_abs@0 ENDP
>>>
>>>All very similar, all should use comparable time (the time of rand()), but
>>>tfunc_omid_abs is double as fast!
>>>
>>>Does the P4 like aligned jump lables? Can they give such extreme effects? Hard
>>>to believe.
>>>
>>>BTW. When I
>>>
>>>#define RAND_VAL() ((int)n)
>>>
>>>to get rid of the rand() overhead (and of course also giving the branch using
>>>versions an advantage), I get normal results:
>>>
>>>       nothing 4051657984 0.811
>>>           abs 4051657984 1.702
>>>    simple_abs 4051657984 1.923
>>>      omid_abs 4051657984 1.702
>>>       sbb_abs 4051657984 4.156
>>>       cdq_abs 4051657984 4.457
>>>      fish_abs 4051657984 2.063
>>>       sar_abs 4051657984 3.324
>>>     cmovl_abs 4051657984 2.604
>>>     cmovs_abs 4051657984 2.644
>>>
>>>405164798 = ((1e9 * (1e9+1))/2) % 2^^32; as expected for N_ITERATIONS=1e9.
>>>
>>>The 0.8 s for nothing is about 2 cycles, which seems reasonable for the loop
>>>
>>>$L977:
>>>        add     eax, ecx
>>>        dec     ecx
>>>        jne     SHORT $L977
>>>
>>>Regards,
>>>Dieter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.