Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: when a & is faster than a && ?

Author: Dann Corbit
Date: 10:48:37 09/24/01
On September 22, 2001 at 21:12:58, Tom Kerrigan wrote:

>On September 21, 2001 at 22:23:38, Dann Corbit wrote:
>
>>On September 21, 2001 at 21:03:41, Tom Kerrigan wrote:
>>
>>>On September 20, 2001 at 18:24:07, Antonio Dieguez wrote:
>>>
>>>>can you tell me?
>>>>
>>>>thanks.
>>>
>>>Should give you the same performance for straightforward situations. Processors
>>>and compilers are optimized so that these cute little tricks are not necessary
>>>to get good performance.
>>
>>Ought to, perhaps.  But in real life, it does not always work out that way.
>>
>>But I have seen a program speed up by 1/3 by application of a chummy hack like
>>that.
>
>Do you still have the details about this 1/3 speedup? Might be interesting to
>look at those. I'm sure whoever manufactured the compiler would find it
>interesting, too. :)

I don't remember which program it was.  20-30% is not at all unusual.  I was
trying to figure out what was going on, so I did this test case to simplify.
During profiling, I found up to 50% speed difference, where the function pointer
or array of functions notion was always much faster than the switch.

It seems that with the advanced CPU's, a missed branch prediction can be
ridiculously expensive.

#include <math.h>
#include <stdio.h>

typedef double  (*f_t) (double);
static f_t      f[] = {log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh,
0};

static double   accum0 = 0;
static double   accum1 = 0;
static double   accum2 = 0;


void            arr(void)
{
    int             i;
    double          d = 0;
    for (i = 0; f[i]; i++) {
        d += f[i] (0.5);
    }
    accum0 += d;
}

void            poi(void)
{
    f_t            *flist = f;
    double          d = 0;
    while (*flist) {
        f_t             ff = *flist;
        d += ff(0.5);
        flist++;
    }
    accum1 += d;
}

void            swi(void)
{
    int             i;
    double          d = 0;
    for (i = 0; f[i]; i++) {
        switch (i) {
        case 0:
            d += f[0] (0.5);
            break;
        case 1:
            d += f[1] (0.5);
            break;
        case 2:
            d += f[2] (0.5);
            break;
        case 3:
            d += f[3] (0.5);
            break;
        case 4:
            d += f[4] (0.5);
            break;
        case 5:
            d += f[5] (0.5);
            break;
        case 6:
            d += f[6] (0.5);
            break;
        case 7:
            d += f[7] (0.5);
            break;
        case 8:
            d += f[8] (0.5);
            break;
        case 9:
            d += f[9] (0.5);
            break;
        default:
            break;
        }
    }
    accum2 += d;
}

int             main(void)
{
    long            i;
    for (i = 0; i < 1000000; i++) {
        arr();
        poi();
        swi();
    }
    printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
    return 0;
}

Here is an example of usage from actual code in the Beowulf chess engine:
#ifdef USE_FUNCTION_LIST
        score += fllist[p + 6] (sq, LastBlackPawn, LastWhitePawn, Pawns,
&npawns);
#else
        switch (p) {
            /* White Piece */
        case (wpawn):
            score += PAWN_SCORE + PawnPosition[sq] + RankBonus[Rank(sq)];
            LastWhitePawn[File(sq)] = sq;
            Pawns[npawns++] = sq;
            break;
        case (wrook):
            score += ROOK_SCORE + RookPosition[sq];
            break;
        case (wknight):
            score += KNIGHT_SCORE + KnightPosition[sq];
            break;
        case (wbishop):
            score += BISHOP_SCORE + BishopPosition[sq];
            break;
        case (wqueen):
            score += QUEEN_SCORE + QueenPosition[sq];
            break;
            /* Black Piece */
        case (bpawn):
            score -= PAWN_SCORE + PawnPosition[(Flip[sq])] + RankBonus[7 -
Rank(sq)];
            if (LastBlackPawn[File(sq)] == 64)
                LastBlackPawn[File(sq)] = sq;
            Pawns[npawns++] = sq;
            break;
        case (brook):
            score -= ROOK_SCORE + RookPosition[(Flip[sq])];
            break;
        case (bknight):
            score -= KNIGHT_SCORE + KnightPosition[(Flip[sq])];
            break;
        case (bbishop):
            score -= BISHOP_SCORE + BishopPosition[(Flip[sq])];
            break;
        case (bqueen):
            score -= QUEEN_SCORE + QueenPosition[(Flip[sq])];
            break;
        }
#endif

Colin's pretty and clear code is expressed by my bletcherous list of functions.
Before the change, this branch of code was the bottleneck.  After the change,
speed increased about 7% and the bottleneck formed elsewhere.

Of course, this will be *very* compiler dependent and also will change greatly
from CPU to CPU, even using the same compiler.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.