Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: One (silly) question about "C"

Author: Dieter Buerssner

Date: 16:51:48 02/05/02

Go up one level in this thread


On February 05, 2002 at 17:29:47, Dann Corbit wrote:

>On February 05, 2002 at 16:26:08, Tom Likens wrote:
>[snip]
>>I'd be careful with Gnu's -funroll-loops (and especially -funroll-all-loops)
>>since these switches can actually make your program run slower by causing
>>a cache miss/penalty.

Yes. Typically applications I have written with speed in mind, get slower by
using -funroll-loops. They allways get slower by -funroll-all-loops (and the gcc
manual also mentions this). With very small test programs, results will be
different. Of course, it will depend on many details. For example, one could
order modules, by fitting optimization options. And then use different
optimization options for different modules. I find this very impractible, and
prefer to do manual loop unrolling now and then, when I see, that it can make a
difference.

>I just use -O3 and let the compiler decide.

I have a total different experience. All programs I care about get slower by
using -O3 with gcc. One reason is excessive inlining and bloat in the generated
code. Another reasons is worse register allocation due to the inlining in many
cases. I know, that this may not sound convincing, but it is an experience.

Have fun testing the following code with your favorite compilers and your
favorite optimization options. Here the manual unrolling allways wins -
typically by a factor of two (naturally depending on the size of the array and
more things).

Regards,
Dieter


#include <stdlib.h>
#include <stdio.h>
#include <time.h>

long sum1(int *a, size_t n);
long sumu8(int *a, size_t n);

int main(int argc, char *argv[])
{
  size_t n = 7853;
  long i, nloops=1000000L;
  clock_t t;
  int *a;
  double seconds, sum; /* So it will not overflow so fast */

  if (argc > 1)
    nloops = atol(argv[1]); /* Don't care about any checks here */

  printf("Expecting a sum of %.0f\n", 0.5*nloops*n*(n-1));
  a = malloc(n * sizeof *a);
  if (a == NULL)
    return EXIT_FAILURE;
  for (i=0; i<n; i++)
    a[i] = i;

  t = clock();
  sum=0;
  for (i=0; i<nloops; i++)
    sum += sum1(a, n);
  seconds = (double)(clock()-t)/CLOCKS_PER_SEC;
  printf("sum normal:   sum=%.0f, used %.3f seconds\n", sum, seconds);

  t = clock();
  sum=0;
  for (i=0; i<nloops; i++)
    sum += sumu8(a, n);
  seconds = (double)(clock()-t)/CLOCKS_PER_SEC;
  printf("sum unrolled: sum=%.0f, used %.3f seconds\n", sum, seconds);
  free(a);
  return EXIT_SUCCESS;
}

long sum1(int *a, size_t n)
{
  size_t i;
  long sum=0;

  for (i=0; i<n; i++)
    sum += a[i];
  return sum;
}

/* The same unrolled */
long sumu8(int *a, size_t n)
{
  long sum=0;

  switch (n%8)
  {
    case 7: sum += *a++; /* All fall through */
    case 6: sum += *a++;
    case 5: sum += *a++;
    case 4: sum += *a++;
    case 3: sum += *a++;
    case 2: sum += *a++;
    case 1: sum += *a++;
  }
  for (n /= 8; n != 0; --n)
  {
    sum += a[0];
    sum += a[1];
    sum += a[2];
    sum += a[3];
    sum += a[4];
    sum += a[5];
    sum += a[6];
    sum += a[7];
    a += 8;
  }
  return sum;
}




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.