Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Avoiding if ( expr ) and branch misprediction ?

Author: Gerd Isenberg

Date: 14:11:04 01/19/06

Go up one level in this thread


On January 19, 2006 at 15:58:20, Dann Corbit wrote:

>On January 19, 2006 at 15:19:22, Gerd Isenberg wrote:
>
>>>>>assert(expression == 0 || expression == 1);
>>>>>foo[expression]();
>>>>
>>>>No - avoid that for simple expressions!
>>>
>>>You might be surprised:
>>>http://chessprogramming.org/cccsearch/ccc.php?art_id=178068
>>>
>>>Depends on the compiler, of course.
>>
>>Ok, a few years old - it was quite ok on P3 iirc.
>>To my knowledge on recent processors (amd64) indirect branches with (not
>>necessarily random) changed target addresses implies a missprediction.
>>Also i would never replace a simple array access by indrect branch.
>>
>> a = x[boolexpression];
>> a = foo[boolexpression](); // returns either x[false] or x[true]
>
>The program:
>#include <math.h>
>#include <stdio.h>
>
>typedef double  (*f_t) (double);
>static f_t      f[] =
>{log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh, 0};
>
>static double   accum0 = 0;
>static double   accum1 = 0;
>static double   accum2 = 0;
>
>
>void            arr(void)
>{
>    int             i;
>    double          d = 0;
>    for (i = 0; f[i]; i++) {
>        d += f[i] (0.5);
>    }
>    accum0 += d;
>}
>
>void            poi(void)
>{
>    f_t            *flist = f;
>    double          d = 0;
>    while (*flist) {
>        f_t             ff = *flist;
>        d += ff(0.5);
>        flist++;
>    }
>    accum1 += d;
>}
>
>void            swi(void)
>{
>    int             i;
>    double          d = 0;
>    for (i = 0; f[i]; i++) {
>        switch (i) {
>        case 0:
>            d += f[0] (0.5);
>            break;
>        case 1:
>            d += f[1] (0.5);
>            break;
>        case 2:
>            d += f[2] (0.5);
>            break;
>        case 3:
>            d += f[3] (0.5);
>            break;
>        case 4:
>            d += f[4] (0.5);
>            break;
>        case 5:
>            d += f[5] (0.5);
>            break;
>        case 6:
>            d += f[6] (0.5);
>            break;
>        case 7:
>            d += f[7] (0.5);
>            break;
>        case 8:
>            d += f[8] (0.5);
>            break;
>        case 9:
>            d += f[9] (0.5);
>            break;
>        default:
>            break;
>        }
>    }
>    accum2 += d;
>}
>
>int             main(void)
>{
>    long            i;
>    for (i = 0; i < 1000000; i++) {
>        arr();
>        poi();
>        swi();
>    }
>    printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
>    return 0;
>}
>
>given compiler flags:
>/Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
>"_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /FD /MT /Zp16 /Gy /Fp".\Release/test.pch"
>/Fo".\Release/" /Fd".\Release/" /W3 /nologo /c /Zi /Gd /TP /errorReport:prompt
>
>for compiler:
>icrosoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86
>opyright (C) Microsoft Corporation.  All rights reserved.
>
>Profile data:
>Segment	Offset	Address	Size	Function	Class	Full Name	Module	Source File	Timer
>Samples	Resolved Module Path	Timer events	Timer %
>1	7536	11632	596	CItan_pentium4		CItan_pentium4	test.exe		156	c:\tmp\release\test.exe	156000	10.96275474
>1	3904	8000	103	trandisp1		trandisp1	test.exe		149	c:\tmp\release\test.exe	149000	10.47083626
>1	7019	11115	406	ctrandisp2		ctrandisp2	test.exe		104	c:\tmp\release\test.exe	104000	7.308503162
>1	5488	9584	620	CIlog_pentium4		CIlog_pentium4	test.exe		101	c:\tmp\release\test.exe	101000	7.097680956
>1	5056	9152	428	CIcos_pentium4		CIcos_pentium4	test.exe		100	c:\tmp\release\test.exe	100000	7.027406887
>1	14384	18480	456	CIsin_pentium4		CIsin_pentium4	test.exe		86	c:\tmp\release\test.exe	86000	6.043569923
>1	13024	17120	644	CIlog10_pentium4		CIlog10_pentium4	test.exe		85	c:\tmp\release\test.exe	85000	5.973295854
>1	192	4288	408	swi		swi	test.exe	c:\tmp\test.c	85	c:\tmp\release\test.exe	85000	5.973295854
>1	7476	11572	60	fload		fload	test.exe		80	c:\tmp\release\test.exe	80000	5.621925509
>1	1600	5696	63	tan		tan	test.exe		66	c:\tmp\release\test.exe	66000	4.638088545
>1	13680	17776	701	CIexp_pentium4		CIexp_pentium4	test.exe		65	c:\tmp\release\test.exe	65000	4.567814476
>1	7425	11521	51	ctrandisp1		ctrandisp1	test.exe		49	c:\tmp\release\test.exe	49000	3.443429375
>1	2592	6688	59	exp		exp	test.exe		45	c:\tmp\release\test.exe	45000	3.162333099
>1	1216	5312	63	log		log	test.exe		37	c:\tmp\release\test.exe	37000	2.600140548
>1	720	4816	186	CIsqrt		CIsqrt	test.exe		35	c:\tmp\release\test.exe	35000	2.45959241
>1	912	5008	63	cos		cos	test.exe		35	c:\tmp\release\test.exe	35000	2.45959241
>1	0	4096	96	arr		arr	test.exe	c:\tmp\test.c	29	c:\tmp\release\test.exe	29000	2.037947997
>1	2256	6352	63	log10		log10	test.exe		28	c:\tmp\release\test.exe	28000	1.967673928
>1	2736	6832	63	sin		sin	test.exe		28	c:\tmp\release\test.exe	28000	1.967673928
>1	96	4192	92	poi		poi	test.exe	c:\tmp\test.c	20	c:\tmp\release\test.exe	20000	1.405481377
>1	4677	8773	67	fload_withFB		fload_withFB	test.exe		11	c:\tmp\release\test.exe	11000	0.773014758
>1	1569	5665	7	tanh		tanh	test.exe		9	c:\tmp\release\test.exe	9000	0.63246662
>1	1562	5658	7	cosh		cosh	test.exe		8	c:\tmp\release\test.exe	8000	0.562192551
>1	1552	5648	10	sinh		sinh	test.exe		8	c:\tmp\release\test.exe	8000	0.562192551
>1	4779	8875	42	math_exit		math_exit	test.exe		2	c:\tmp\release\test.exe	2000	0.140548138
>1	608	4704	88	main		main	test.exe	c:\tmp\test.c	2	c:\tmp\release\test.exe	2000	0.140548138
>
>Switch is 5.97% of execution time.
>Pointer is 1.4% of execution time.


This has nothing to do with getting preinitialized data from arrays.
It is switch aka indirect jump verus indirect call.
You may try an explicite if-else-else cascade ;-)


>
>4 times faster.

Strange!? Switch is most likely conditional jmp with a leading range check. So i
don't trust that profiling so much with soo short routines amd probably huge
"sampling" errors (i guess). Also the return overhead "pairs" with the x87
instructions, which favors the call a bit.


>
>Of course, as always, YMMV.
>
>Hardware was:
> CPU Identification utility v1.11                 (c) 1997-2005 Jan Steunebrink
>
>„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ
> CPU Vendor and Model: AMD Athlon 64 2800+-3700+
> Internal CPU speed  : 2199.4 MHz  (using internal Time Stamp Counter)
> Clock Multiplier    : Available only in Real Mode!
> CPU-ID Vendor string: AuthenticAMD
> CPU-ID Name string  : AMD Athlon(tm) 64 Processor 3400+
> CPU-ID Signature    : 0F4A
>                       „ „ „ „¤„Ÿ Stepping or sub-model no.
>                       „ „ „¤„Ÿ Model: Indicates CPU Model and 486 L1 cache mode
>                       „ „¤„Ÿ Family: 4=486, Am5x86, Cx5x86
>                       „            5=Pentium, Nx586, Cx6x86, K5/K6, C6, mP6
>                       „            6=PentiumPro/II/III, CxMII/III, Athlon, C3
>                       „            F=Pentium4, Athlon64
>                       „¤„Ÿ Type: 0=Standard, 1=Overdrive, 2=2nd Dual Pentium
> Current CPU mode    : Protected
> Internal (L1) cache : Enabled in Write-Back mode



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.