Author: Dieter Buerssner
Date: 16:51:48 02/05/02
Go up one level in this thread
On February 05, 2002 at 17:29:47, Dann Corbit wrote: >On February 05, 2002 at 16:26:08, Tom Likens wrote: >[snip] >>I'd be careful with Gnu's -funroll-loops (and especially -funroll-all-loops) >>since these switches can actually make your program run slower by causing >>a cache miss/penalty. Yes. Typically applications I have written with speed in mind, get slower by using -funroll-loops. They allways get slower by -funroll-all-loops (and the gcc manual also mentions this). With very small test programs, results will be different. Of course, it will depend on many details. For example, one could order modules, by fitting optimization options. And then use different optimization options for different modules. I find this very impractible, and prefer to do manual loop unrolling now and then, when I see, that it can make a difference. >I just use -O3 and let the compiler decide. I have a total different experience. All programs I care about get slower by using -O3 with gcc. One reason is excessive inlining and bloat in the generated code. Another reasons is worse register allocation due to the inlining in many cases. I know, that this may not sound convincing, but it is an experience. Have fun testing the following code with your favorite compilers and your favorite optimization options. Here the manual unrolling allways wins - typically by a factor of two (naturally depending on the size of the array and more things). Regards, Dieter #include <stdlib.h> #include <stdio.h> #include <time.h> long sum1(int *a, size_t n); long sumu8(int *a, size_t n); int main(int argc, char *argv[]) { size_t n = 7853; long i, nloops=1000000L; clock_t t; int *a; double seconds, sum; /* So it will not overflow so fast */ if (argc > 1) nloops = atol(argv[1]); /* Don't care about any checks here */ printf("Expecting a sum of %.0f\n", 0.5*nloops*n*(n-1)); a = malloc(n * sizeof *a); if (a == NULL) return EXIT_FAILURE; for (i=0; i<n; i++) a[i] = i; t = clock(); sum=0; for (i=0; i<nloops; i++) sum += sum1(a, n); seconds = (double)(clock()-t)/CLOCKS_PER_SEC; printf("sum normal: sum=%.0f, used %.3f seconds\n", sum, seconds); t = clock(); sum=0; for (i=0; i<nloops; i++) sum += sumu8(a, n); seconds = (double)(clock()-t)/CLOCKS_PER_SEC; printf("sum unrolled: sum=%.0f, used %.3f seconds\n", sum, seconds); free(a); return EXIT_SUCCESS; } long sum1(int *a, size_t n) { size_t i; long sum=0; for (i=0; i<n; i++) sum += a[i]; return sum; } /* The same unrolled */ long sumu8(int *a, size_t n) { long sum=0; switch (n%8) { case 7: sum += *a++; /* All fall through */ case 6: sum += *a++; case 5: sum += *a++; case 4: sum += *a++; case 3: sum += *a++; case 2: sum += *a++; case 1: sum += *a++; } for (n /= 8; n != 0; --n) { sum += a[0]; sum += a[1]; sum += a[2]; sum += a[3]; sum += a[4]; sum += a[5]; sum += a[6]; sum += a[7]; a += 8; } return sum; }
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.