Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Avoiding if ( expr ) and branch misprediction ?

Author: Dann Corbit

Date: 12:58:20 01/19/06

Go up one level in this thread


On January 19, 2006 at 15:19:22, Gerd Isenberg wrote:

>>>>assert(expression == 0 || expression == 1);
>>>>foo[expression]();
>>>
>>>No - avoid that for simple expressions!
>>
>>You might be surprised:
>>http://chessprogramming.org/cccsearch/ccc.php?art_id=178068
>>
>>Depends on the compiler, of course.
>
>Ok, a few years old - it was quite ok on P3 iirc.
>To my knowledge on recent processors (amd64) indirect branches with (not
>necessarily random) changed target addresses implies a missprediction.
>Also i would never replace a simple array access by indrect branch.
>
> a = x[boolexpression];
> a = foo[boolexpression](); // returns either x[false] or x[true]

The program:
#include <math.h>
#include <stdio.h>

typedef double  (*f_t) (double);
static f_t      f[] =
{log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh, 0};

static double   accum0 = 0;
static double   accum1 = 0;
static double   accum2 = 0;


void            arr(void)
{
    int             i;
    double          d = 0;
    for (i = 0; f[i]; i++) {
        d += f[i] (0.5);
    }
    accum0 += d;
}

void            poi(void)
{
    f_t            *flist = f;
    double          d = 0;
    while (*flist) {
        f_t             ff = *flist;
        d += ff(0.5);
        flist++;
    }
    accum1 += d;
}

void            swi(void)
{
    int             i;
    double          d = 0;
    for (i = 0; f[i]; i++) {
        switch (i) {
        case 0:
            d += f[0] (0.5);
            break;
        case 1:
            d += f[1] (0.5);
            break;
        case 2:
            d += f[2] (0.5);
            break;
        case 3:
            d += f[3] (0.5);
            break;
        case 4:
            d += f[4] (0.5);
            break;
        case 5:
            d += f[5] (0.5);
            break;
        case 6:
            d += f[6] (0.5);
            break;
        case 7:
            d += f[7] (0.5);
            break;
        case 8:
            d += f[8] (0.5);
            break;
        case 9:
            d += f[9] (0.5);
            break;
        default:
            break;
        }
    }
    accum2 += d;
}

int             main(void)
{
    long            i;
    for (i = 0; i < 1000000; i++) {
        arr();
        poi();
        swi();
    }
    printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
    return 0;
}

given compiler flags:
/Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /FD /MT /Zp16 /Gy /Fp".\Release/test.pch"
/Fo".\Release/" /Fd".\Release/" /W3 /nologo /c /Zi /Gd /TP /errorReport:prompt

for compiler:
icrosoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86
opyright (C) Microsoft Corporation.  All rights reserved.

Profile data:
Segment	Offset	Address	Size	Function	Class	Full Name	Module	Source File	Timer
Samples	Resolved Module Path	Timer events	Timer %
1	7536	11632	596	CItan_pentium4		CItan_pentium4	test.exe		156	c:\tmp\release\test.exe	156000	10.96275474
1	3904	8000	103	trandisp1		trandisp1	test.exe		149	c:\tmp\release\test.exe	149000	10.47083626
1	7019	11115	406	ctrandisp2		ctrandisp2	test.exe		104	c:\tmp\release\test.exe	104000	7.308503162
1	5488	9584	620	CIlog_pentium4		CIlog_pentium4	test.exe		101	c:\tmp\release\test.exe	101000	7.097680956
1	5056	9152	428	CIcos_pentium4		CIcos_pentium4	test.exe		100	c:\tmp\release\test.exe	100000	7.027406887
1	14384	18480	456	CIsin_pentium4		CIsin_pentium4	test.exe		86	c:\tmp\release\test.exe	86000	6.043569923
1	13024	17120	644	CIlog10_pentium4		CIlog10_pentium4	test.exe		85	c:\tmp\release\test.exe	85000	5.973295854
1	192	4288	408	swi		swi	test.exe	c:\tmp\test.c	85	c:\tmp\release\test.exe	85000	5.973295854
1	7476	11572	60	fload		fload	test.exe		80	c:\tmp\release\test.exe	80000	5.621925509
1	1600	5696	63	tan		tan	test.exe		66	c:\tmp\release\test.exe	66000	4.638088545
1	13680	17776	701	CIexp_pentium4		CIexp_pentium4	test.exe		65	c:\tmp\release\test.exe	65000	4.567814476
1	7425	11521	51	ctrandisp1		ctrandisp1	test.exe		49	c:\tmp\release\test.exe	49000	3.443429375
1	2592	6688	59	exp		exp	test.exe		45	c:\tmp\release\test.exe	45000	3.162333099
1	1216	5312	63	log		log	test.exe		37	c:\tmp\release\test.exe	37000	2.600140548
1	720	4816	186	CIsqrt		CIsqrt	test.exe		35	c:\tmp\release\test.exe	35000	2.45959241
1	912	5008	63	cos		cos	test.exe		35	c:\tmp\release\test.exe	35000	2.45959241
1	0	4096	96	arr		arr	test.exe	c:\tmp\test.c	29	c:\tmp\release\test.exe	29000	2.037947997
1	2256	6352	63	log10		log10	test.exe		28	c:\tmp\release\test.exe	28000	1.967673928
1	2736	6832	63	sin		sin	test.exe		28	c:\tmp\release\test.exe	28000	1.967673928
1	96	4192	92	poi		poi	test.exe	c:\tmp\test.c	20	c:\tmp\release\test.exe	20000	1.405481377
1	4677	8773	67	fload_withFB		fload_withFB	test.exe		11	c:\tmp\release\test.exe	11000	0.773014758
1	1569	5665	7	tanh		tanh	test.exe		9	c:\tmp\release\test.exe	9000	0.63246662
1	1562	5658	7	cosh		cosh	test.exe		8	c:\tmp\release\test.exe	8000	0.562192551
1	1552	5648	10	sinh		sinh	test.exe		8	c:\tmp\release\test.exe	8000	0.562192551
1	4779	8875	42	math_exit		math_exit	test.exe		2	c:\tmp\release\test.exe	2000	0.140548138
1	608	4704	88	main		main	test.exe	c:\tmp\test.c	2	c:\tmp\release\test.exe	2000	0.140548138

Switch is 5.97% of execution time.
Pointer is 1.4% of execution time.

4 times faster.

Of course, as always, YMMV.

Hardware was:
 CPU Identification utility v1.11                 (c) 1997-2005 Jan Steunebrink

„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ
 CPU Vendor and Model: AMD Athlon 64 2800+-3700+
 Internal CPU speed  : 2199.4 MHz  (using internal Time Stamp Counter)
 Clock Multiplier    : Available only in Real Mode!
 CPU-ID Vendor string: AuthenticAMD
 CPU-ID Name string  : AMD Athlon(tm) 64 Processor 3400+
 CPU-ID Signature    : 0F4A
                       „ „ „ „¤„Ÿ Stepping or sub-model no.
                       „ „ „¤„Ÿ Model: Indicates CPU Model and 486 L1 cache mode
                       „ „¤„Ÿ Family: 4=486, Am5x86, Cx5x86
                       „            5=Pentium, Nx586, Cx6x86, K5/K6, C6, mP6
                       „            6=PentiumPro/II/III, CxMII/III, Athlon, C3
                       „            F=Pentium4, Athlon64
                       „¤„Ÿ Type: 0=Standard, 1=Overdrive, 2=2nd Dual Pentium
 Current CPU mode    : Protected
 Internal (L1) cache : Enabled in Write-Back mode



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.