Author: Dann Corbit
Date: 10:48:37 09/24/01
Go up one level in this thread
On September 22, 2001 at 21:12:58, Tom Kerrigan wrote:
>On September 21, 2001 at 22:23:38, Dann Corbit wrote:
>
>>On September 21, 2001 at 21:03:41, Tom Kerrigan wrote:
>>
>>>On September 20, 2001 at 18:24:07, Antonio Dieguez wrote:
>>>
>>>>can you tell me?
>>>>
>>>>thanks.
>>>
>>>Should give you the same performance for straightforward situations. Processors
>>>and compilers are optimized so that these cute little tricks are not necessary
>>>to get good performance.
>>
>>Ought to, perhaps. But in real life, it does not always work out that way.
>>
>>But I have seen a program speed up by 1/3 by application of a chummy hack like
>>that.
>
>Do you still have the details about this 1/3 speedup? Might be interesting to
>look at those. I'm sure whoever manufactured the compiler would find it
>interesting, too. :)
I don't remember which program it was. 20-30% is not at all unusual. I was
trying to figure out what was going on, so I did this test case to simplify.
During profiling, I found up to 50% speed difference, where the function pointer
or array of functions notion was always much faster than the switch.
It seems that with the advanced CPU's, a missed branch prediction can be
ridiculously expensive.
#include <math.h>
#include <stdio.h>
typedef double (*f_t) (double);
static f_t f[] = {log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh,
0};
static double accum0 = 0;
static double accum1 = 0;
static double accum2 = 0;
void arr(void)
{
int i;
double d = 0;
for (i = 0; f[i]; i++) {
d += f[i] (0.5);
}
accum0 += d;
}
void poi(void)
{
f_t *flist = f;
double d = 0;
while (*flist) {
f_t ff = *flist;
d += ff(0.5);
flist++;
}
accum1 += d;
}
void swi(void)
{
int i;
double d = 0;
for (i = 0; f[i]; i++) {
switch (i) {
case 0:
d += f[0] (0.5);
break;
case 1:
d += f[1] (0.5);
break;
case 2:
d += f[2] (0.5);
break;
case 3:
d += f[3] (0.5);
break;
case 4:
d += f[4] (0.5);
break;
case 5:
d += f[5] (0.5);
break;
case 6:
d += f[6] (0.5);
break;
case 7:
d += f[7] (0.5);
break;
case 8:
d += f[8] (0.5);
break;
case 9:
d += f[9] (0.5);
break;
default:
break;
}
}
accum2 += d;
}
int main(void)
{
long i;
for (i = 0; i < 1000000; i++) {
arr();
poi();
swi();
}
printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
return 0;
}
Here is an example of usage from actual code in the Beowulf chess engine:
#ifdef USE_FUNCTION_LIST
score += fllist[p + 6] (sq, LastBlackPawn, LastWhitePawn, Pawns,
&npawns);
#else
switch (p) {
/* White Piece */
case (wpawn):
score += PAWN_SCORE + PawnPosition[sq] + RankBonus[Rank(sq)];
LastWhitePawn[File(sq)] = sq;
Pawns[npawns++] = sq;
break;
case (wrook):
score += ROOK_SCORE + RookPosition[sq];
break;
case (wknight):
score += KNIGHT_SCORE + KnightPosition[sq];
break;
case (wbishop):
score += BISHOP_SCORE + BishopPosition[sq];
break;
case (wqueen):
score += QUEEN_SCORE + QueenPosition[sq];
break;
/* Black Piece */
case (bpawn):
score -= PAWN_SCORE + PawnPosition[(Flip[sq])] + RankBonus[7 -
Rank(sq)];
if (LastBlackPawn[File(sq)] == 64)
LastBlackPawn[File(sq)] = sq;
Pawns[npawns++] = sq;
break;
case (brook):
score -= ROOK_SCORE + RookPosition[(Flip[sq])];
break;
case (bknight):
score -= KNIGHT_SCORE + KnightPosition[(Flip[sq])];
break;
case (bbishop):
score -= BISHOP_SCORE + BishopPosition[(Flip[sq])];
break;
case (bqueen):
score -= QUEEN_SCORE + QueenPosition[(Flip[sq])];
break;
}
#endif
Colin's pretty and clear code is expressed by my bletcherous list of functions.
Before the change, this branch of code was the bottleneck. After the change,
speed increased about 7% and the bottleneck formed elsewhere.
Of course, this will be *very* compiler dependent and also will change greatly
from CPU to CPU, even using the same compiler.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.