Author: Dann Corbit
Date: 10:48:37 09/24/01
Go up one level in this thread
On September 22, 2001 at 21:12:58, Tom Kerrigan wrote: >On September 21, 2001 at 22:23:38, Dann Corbit wrote: > >>On September 21, 2001 at 21:03:41, Tom Kerrigan wrote: >> >>>On September 20, 2001 at 18:24:07, Antonio Dieguez wrote: >>> >>>>can you tell me? >>>> >>>>thanks. >>> >>>Should give you the same performance for straightforward situations. Processors >>>and compilers are optimized so that these cute little tricks are not necessary >>>to get good performance. >> >>Ought to, perhaps. But in real life, it does not always work out that way. >> >>But I have seen a program speed up by 1/3 by application of a chummy hack like >>that. > >Do you still have the details about this 1/3 speedup? Might be interesting to >look at those. I'm sure whoever manufactured the compiler would find it >interesting, too. :) I don't remember which program it was. 20-30% is not at all unusual. I was trying to figure out what was going on, so I did this test case to simplify. During profiling, I found up to 50% speed difference, where the function pointer or array of functions notion was always much faster than the switch. It seems that with the advanced CPU's, a missed branch prediction can be ridiculously expensive. #include <math.h> #include <stdio.h> typedef double (*f_t) (double); static f_t f[] = {log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh, 0}; static double accum0 = 0; static double accum1 = 0; static double accum2 = 0; void arr(void) { int i; double d = 0; for (i = 0; f[i]; i++) { d += f[i] (0.5); } accum0 += d; } void poi(void) { f_t *flist = f; double d = 0; while (*flist) { f_t ff = *flist; d += ff(0.5); flist++; } accum1 += d; } void swi(void) { int i; double d = 0; for (i = 0; f[i]; i++) { switch (i) { case 0: d += f[0] (0.5); break; case 1: d += f[1] (0.5); break; case 2: d += f[2] (0.5); break; case 3: d += f[3] (0.5); break; case 4: d += f[4] (0.5); break; case 5: d += f[5] (0.5); break; case 6: d += f[6] (0.5); break; case 7: d += f[7] (0.5); break; case 8: d += f[8] (0.5); break; case 9: d += f[9] (0.5); break; default: break; } } accum2 += d; } int main(void) { long i; for (i = 0; i < 1000000; i++) { arr(); poi(); swi(); } printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2); return 0; } Here is an example of usage from actual code in the Beowulf chess engine: #ifdef USE_FUNCTION_LIST score += fllist[p + 6] (sq, LastBlackPawn, LastWhitePawn, Pawns, &npawns); #else switch (p) { /* White Piece */ case (wpawn): score += PAWN_SCORE + PawnPosition[sq] + RankBonus[Rank(sq)]; LastWhitePawn[File(sq)] = sq; Pawns[npawns++] = sq; break; case (wrook): score += ROOK_SCORE + RookPosition[sq]; break; case (wknight): score += KNIGHT_SCORE + KnightPosition[sq]; break; case (wbishop): score += BISHOP_SCORE + BishopPosition[sq]; break; case (wqueen): score += QUEEN_SCORE + QueenPosition[sq]; break; /* Black Piece */ case (bpawn): score -= PAWN_SCORE + PawnPosition[(Flip[sq])] + RankBonus[7 - Rank(sq)]; if (LastBlackPawn[File(sq)] == 64) LastBlackPawn[File(sq)] = sq; Pawns[npawns++] = sq; break; case (brook): score -= ROOK_SCORE + RookPosition[(Flip[sq])]; break; case (bknight): score -= KNIGHT_SCORE + KnightPosition[(Flip[sq])]; break; case (bbishop): score -= BISHOP_SCORE + BishopPosition[(Flip[sq])]; break; case (bqueen): score -= QUEEN_SCORE + QueenPosition[(Flip[sq])]; break; } #endif Colin's pretty and clear code is expressed by my bletcherous list of functions. Before the change, this branch of code was the bottleneck. After the change, speed increased about 7% and the bottleneck formed elsewhere. Of course, this will be *very* compiler dependent and also will change greatly from CPU to CPU, even using the same compiler.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.