Author: Dieter Buerssner
Date: 11:27:51 07/06/03
Go up one level in this thread
On July 06, 2003 at 13:29:56, Gerd Isenberg wrote: >What about unrolling the loop a bit, eg. repeat the body statement 2..10 times. I unrolled by 4; I changed DECLARE_TEST_FUNCTION to #define DECLARE_TEST_FUNC(name) \ unsigned long tfunc_##name(void) \ { \ int a; \ unsigned long n=N_ITERATIONS/4; \ unsigned long sum=0; \ do \ { \ a = RAND_VAL(); \ sum += name(a); \ a = RAND_VAL(); \ sum += name(a); \ a = RAND_VAL(); \ sum += name(a); \ a = RAND_VAL(); \ sum += name(a); \ } while (--n != 0); \ n = N_ITERATIONS%4; \ if (n) \ do \ { \ a = RAND_VAL(); \ sum += name(a); \ } while (--n != 0); \ return sum; \ } MSVC -Ox2 -Ob2 -G6 -Gr -GF, library rand() nothing 3951541892 12.578 abs 1713113360 13.699 simple_abs 1713113360 17.395 omid_abs 1713113360 13.900 sbb_abs 1713113360 18.116 cdq_abs 1713113360 20.009 fish_abs 1713113360 19.969 sar_abs 1713113360 18.186 cmovl_abs 1713113360 17.165 cmovs_abs 1713113360 18.436 Omid abs is back to normal speed. Unrolling has little effect otherwise. I think this can be expected, when assuming that rand() is the slowest part of everything, and that the branches of the long loop are always get predicted correctly. The strangeness remains. Without unrolling I can reproduce the 7.x seconds always. MSVC -Ox2 -Ob2 -G6 -Gr -GF, sr32_rand: nothing 1305123480 5.758 abs 2955546426 6.159 simple_abs 2955546426 12.918 omid_abs 2955546426 6.119 sbb_abs 2955546426 15.713 cdq_abs 2955546426 17.605 fish_abs 2955546426 20.600 sar_abs 2955546426 18.186 cmovl_abs 2955546426 17.635 cmovs_abs 2955546426 17.826 Gcc -O3, sr32_rand: nothing 1305123480 5.714 abs 2955546426 12.747 simple_abs 2955546426 12.747 omid_abs 2955546426 5.879 sbb_abs 2955546426 6.593 cdq_abs 2955546426 6.099 fish_abs 2955546426 12.747 sar_abs 2955546426 6.154 cmovl_abs 2955546426 5.769 cmovs_abs 2955546426 5.714 Gcc -O3 -march=pentium4, sr32_rand: nothing 1305123480 5.714 abs 2955546426 5.714 simple_abs 2955546426 5.714 omid_abs 2955546426 6.044 sbb_abs 2955546426 6.538 cdq_abs 2955546426 6.044 fish_abs 2955546426 12.802 sar_abs 2955546426 6.154 cmovl_abs 2955546426 5.769 cmovs_abs 2955546426 5.769 Library abs() and simple_abs() are fast now, because conditional movs are used: In ATT-Synthax: movl %eax, %ecx negl %ecx cmpl $-1, %eax cmovle %ecx, %eax >Doubling the speed of a function by adding additional abs code - not bad ;-) Indeed. Or is it the other way around. For some strange reason, rand() does not run at normal speed here? I cannot get a loop with only rand() (no sum, etc.) as fast as together with omid_abs. Regards, Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.