Author: Dieter Buerssner
Date: 15:27:33 10/21/02
Go up one level in this thread
On October 21, 2002 at 06:27:09, Gerd Isenberg wrote:
>from AMD Athlon™ Processor x86 Code Optimization Guide:
>Chapter 3 C Source-Level Optimizations
>
>Use Const Type Qualifier
[...]
My little programming ancectdote using const on AMD K6-2. This small function:
typedef unsigned long long ul64;
/* Two implemantations of the multiply with carry RNG.
The only difference is the type of mul */
static ul64 zseed = ((ul64)0x12345678UL<<32) | 0x87654321UL;
unsigned long mwc32(void)
{
unsigned long l1, l2;
ul64 res;
static unsigned long mul=999996864UL;
l1 = (unsigned long)(zseed & 0xffffffffUL);
l2 = zseed>>32;
res = l2+l1*(ul64)mul;
zseed = res;
return (unsigned long)(res & 0xffffffffUL);
}
preformed well. It is somewhat microptimized for gcc and procuces almost optimal
assembly. (Part of the microoptimization was to use a variable for the constant
mul).
After I reviewed one of my programs, I changed the type of mul and added const.
Much later, the program was running much slower. I did not suspect, to have it
to do with that const and it took a long time to find the cause. More
investigation showed, that that routine, with the const was about 8 times
slower. Very strange. The assembly was exactly the same for both versions, with
the only difference, that the variable mul now was in the code segment instead
of the data segment. Data aligning was perfect in both cases. Adding some "dummy
space" after the variable mul in the assembly, speed went back to normal. The
behaviour could be reproduced on other K6-2 processors, but not on Pentium
processors. On the later, both functions performed identically. The thread can
be found via dejanews.
Cheers,
Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.