Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Implementation of the abs() function [o.t.]

Author: Dieter Buerssner

Date: 11:27:51 07/06/03

Go up one level in this thread


On July 06, 2003 at 13:29:56, Gerd Isenberg wrote:

>What about unrolling the loop a bit, eg. repeat the body statement 2..10 times.

I unrolled by 4; I changed DECLARE_TEST_FUNCTION to

#define DECLARE_TEST_FUNC(name) \
unsigned long tfunc_##name(void) \
{                                \
  int a;                         \
  unsigned long n=N_ITERATIONS/4;  \
  unsigned long sum=0;           \
  do                             \
  {                              \
    a = RAND_VAL();              \
    sum += name(a);              \
    a = RAND_VAL();              \
    sum += name(a);              \
    a = RAND_VAL();              \
    sum += name(a);              \
    a = RAND_VAL();              \
    sum += name(a);              \
  } while (--n != 0);            \
  n = N_ITERATIONS%4;            \
  if (n)                         \
    do                           \
    {                            \
      a = RAND_VAL();            \
      sum += name(a);            \
    } while (--n != 0);          \
  return sum;                    \
}

MSVC -Ox2 -Ob2 -G6 -Gr -GF, library rand()
       nothing 3951541892 12.578
           abs 1713113360 13.699
    simple_abs 1713113360 17.395
      omid_abs 1713113360 13.900
       sbb_abs 1713113360 18.116
       cdq_abs 1713113360 20.009
      fish_abs 1713113360 19.969
       sar_abs 1713113360 18.186
     cmovl_abs 1713113360 17.165
     cmovs_abs 1713113360 18.436

Omid abs is back to normal speed. Unrolling has little effect otherwise. I think
this can be expected, when assuming that rand() is the slowest part of
everything, and that the branches of the long loop are always get predicted
correctly. The strangeness remains. Without unrolling I can reproduce the 7.x
seconds always.

MSVC -Ox2 -Ob2 -G6 -Gr -GF, sr32_rand:
       nothing 1305123480 5.758
           abs 2955546426 6.159
    simple_abs 2955546426 12.918
      omid_abs 2955546426 6.119
       sbb_abs 2955546426 15.713
       cdq_abs 2955546426 17.605
      fish_abs 2955546426 20.600
       sar_abs 2955546426 18.186
     cmovl_abs 2955546426 17.635
     cmovs_abs 2955546426 17.826

Gcc -O3, sr32_rand:

       nothing 1305123480 5.714
           abs 2955546426 12.747
    simple_abs 2955546426 12.747
      omid_abs 2955546426 5.879
       sbb_abs 2955546426 6.593
       cdq_abs 2955546426 6.099
      fish_abs 2955546426 12.747
       sar_abs 2955546426 6.154
     cmovl_abs 2955546426 5.769
     cmovs_abs 2955546426 5.714

Gcc -O3 -march=pentium4, sr32_rand:

       nothing 1305123480 5.714
           abs 2955546426 5.714
    simple_abs 2955546426 5.714
      omid_abs 2955546426 6.044
       sbb_abs 2955546426 6.538
       cdq_abs 2955546426 6.044
      fish_abs 2955546426 12.802
       sar_abs 2955546426 6.154
     cmovl_abs 2955546426 5.769
     cmovs_abs 2955546426 5.769

Library abs() and simple_abs() are fast now, because conditional movs are used:

In ATT-Synthax:
        movl    %eax, %ecx
        negl    %ecx
        cmpl    $-1, %eax
        cmovle  %ecx, %eax

>Doubling the speed of a function by adding additional abs code - not bad ;-)

Indeed. Or is it the other way around. For some strange reason, rand() does not
run at normal speed here? I cannot get a loop with only rand() (no sum, etc.) as
fast as together with omid_abs.

Regards,
Dieter




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.