Author: Dieter Buerssner
Date: 11:27:51 07/06/03
Go up one level in this thread
On July 06, 2003 at 13:29:56, Gerd Isenberg wrote:
>What about unrolling the loop a bit, eg. repeat the body statement 2..10 times.
I unrolled by 4; I changed DECLARE_TEST_FUNCTION to
#define DECLARE_TEST_FUNC(name) \
unsigned long tfunc_##name(void) \
{ \
int a; \
unsigned long n=N_ITERATIONS/4; \
unsigned long sum=0; \
do \
{ \
a = RAND_VAL(); \
sum += name(a); \
a = RAND_VAL(); \
sum += name(a); \
a = RAND_VAL(); \
sum += name(a); \
a = RAND_VAL(); \
sum += name(a); \
} while (--n != 0); \
n = N_ITERATIONS%4; \
if (n) \
do \
{ \
a = RAND_VAL(); \
sum += name(a); \
} while (--n != 0); \
return sum; \
}
MSVC -Ox2 -Ob2 -G6 -Gr -GF, library rand()
nothing 3951541892 12.578
abs 1713113360 13.699
simple_abs 1713113360 17.395
omid_abs 1713113360 13.900
sbb_abs 1713113360 18.116
cdq_abs 1713113360 20.009
fish_abs 1713113360 19.969
sar_abs 1713113360 18.186
cmovl_abs 1713113360 17.165
cmovs_abs 1713113360 18.436
Omid abs is back to normal speed. Unrolling has little effect otherwise. I think
this can be expected, when assuming that rand() is the slowest part of
everything, and that the branches of the long loop are always get predicted
correctly. The strangeness remains. Without unrolling I can reproduce the 7.x
seconds always.
MSVC -Ox2 -Ob2 -G6 -Gr -GF, sr32_rand:
nothing 1305123480 5.758
abs 2955546426 6.159
simple_abs 2955546426 12.918
omid_abs 2955546426 6.119
sbb_abs 2955546426 15.713
cdq_abs 2955546426 17.605
fish_abs 2955546426 20.600
sar_abs 2955546426 18.186
cmovl_abs 2955546426 17.635
cmovs_abs 2955546426 17.826
Gcc -O3, sr32_rand:
nothing 1305123480 5.714
abs 2955546426 12.747
simple_abs 2955546426 12.747
omid_abs 2955546426 5.879
sbb_abs 2955546426 6.593
cdq_abs 2955546426 6.099
fish_abs 2955546426 12.747
sar_abs 2955546426 6.154
cmovl_abs 2955546426 5.769
cmovs_abs 2955546426 5.714
Gcc -O3 -march=pentium4, sr32_rand:
nothing 1305123480 5.714
abs 2955546426 5.714
simple_abs 2955546426 5.714
omid_abs 2955546426 6.044
sbb_abs 2955546426 6.538
cdq_abs 2955546426 6.044
fish_abs 2955546426 12.802
sar_abs 2955546426 6.154
cmovl_abs 2955546426 5.769
cmovs_abs 2955546426 5.769
Library abs() and simple_abs() are fast now, because conditional movs are used:
In ATT-Synthax:
movl %eax, %ecx
negl %ecx
cmpl $-1, %eax
cmovle %ecx, %eax
>Doubling the speed of a function by adding additional abs code - not bad ;-)
Indeed. Or is it the other way around. For some strange reason, rand() does not
run at normal speed here? I cannot get a loop with only rand() (no sum, etc.) as
fast as together with omid_abs.
Regards,
Dieter
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.