Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: is this really faster?

Author: Dieter Buerssner

Date: 13:00:05 04/22/03

Go up one level in this thread


On April 21, 2003 at 19:58:42, Filip Tvrzsky wrote:

>On April 21, 2003 at 19:31:15, Filip Tvrzsky wrote:
>>
>>It is very strange, but my test results are rather different!
>>Compiler is gcc 3.2, CPU is Duron 600@900.
>>
>>1) With only -O3 optimization:
>>   res 3880000000
>>   gerd()          32,520 s
>>   crafty()        47,950 s
>>   dieter()        47,010 s
>>
>>2) With following optimization settings set (-O3 and -fsave-memoized
>>-fno-exceptions -fmerge-all-constants -save-temps -march=athlon -mcpu=athlon
>>-mmmx -funroll-loops -fomit-frame-pointer) which is my favorite now:
>>   res 3880000000
>>   gerd()          18,350 s
>>   crafty()        37,290 s
>>   dieter()        41,750 s
>>
>>Maybe, I should little bit investigate compiler assembler output ...
>And with -O2 instead of -O3.
>
>3) Only -O2:
>   gerd()           32,520 s
>   crafty()         45,470 s
>   dieter()         47,680 s
>
>4) -O2 -fsave-memoized -fno-exceptions -fmerge-all-constants -save-temps -
>march=athlon -mcpu=athlon -mmmx -funroll-loops -fomit-frame-pointer:
>   gerd()           19,060 s
>   crafty()         39,710 s
>   dieter()         41,080 s

Filip, this is really strange. Your special options speed up gerd() almost by a
factor of 2. I looked at assembler output, and it is just unexplainable. I also
cannot reproduce it here. I don't have AMD, but I can use -march=pentium4.
Whatever options I try, speed is about the same. Also gerd and dieter are always
comparable in speed (as the assembler output suggests). All the work is done in
the function. I tried to code in a way, that exactly that is measured. Where
does this big factor come from.

It reminds me of the following:

unsigned long mwc32c(void)
{
  unsigned long l1, l2;
  ul64 res;
  static const unsigned long mul=999996864UL;
      /* ^^^^^ */
  l1 = (unsigned long)(zseedc & 0xffffffffUL);
  l2 = zseedc>>32;
  res = l2+l1*(ul64)mul;
  zseedc = res;
  return (unsigned long)(res & 0xffffffffUL);
}

The above code ran on K6-2 233 compiled with gcc 2.9.5.2 about 10 (!) times
faster, when I delete the const. Actually it was the other way around, I started
without const, reviewed my source, added that const, to make the source
"cleaner". Later I recognized, that some simulation prog used much longer. It
took me a very long time, to see that the const was the prob. It could be
confirmed on other K6-2. On other CPUs, there was no difference. I investigated
much closer (on assembly level). The const was in the code segment (why not?),
without const in the data segment. When I added some fillers (I think 12 bytes
were enough) between the constant and the code, the speed was as expected again.
Later, it was concluded, that this was some sort of K6-2 bug.

Regards,
Dieter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.