Author: Dann Corbit
Date: 12:58:20 01/19/06
Go up one level in this thread
On January 19, 2006 at 15:19:22, Gerd Isenberg wrote:
>>>>assert(expression == 0 || expression == 1);
>>>>foo[expression]();
>>>
>>>No - avoid that for simple expressions!
>>
>>You might be surprised:
>>http://chessprogramming.org/cccsearch/ccc.php?art_id=178068
>>
>>Depends on the compiler, of course.
>
>Ok, a few years old - it was quite ok on P3 iirc.
>To my knowledge on recent processors (amd64) indirect branches with (not
>necessarily random) changed target addresses implies a missprediction.
>Also i would never replace a simple array access by indrect branch.
>
> a = x[boolexpression];
> a = foo[boolexpression](); // returns either x[false] or x[true]
The program:
#include <math.h>
#include <stdio.h>
typedef double (*f_t) (double);
static f_t f[] =
{log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh, 0};
static double accum0 = 0;
static double accum1 = 0;
static double accum2 = 0;
void arr(void)
{
int i;
double d = 0;
for (i = 0; f[i]; i++) {
d += f[i] (0.5);
}
accum0 += d;
}
void poi(void)
{
f_t *flist = f;
double d = 0;
while (*flist) {
f_t ff = *flist;
d += ff(0.5);
flist++;
}
accum1 += d;
}
void swi(void)
{
int i;
double d = 0;
for (i = 0; f[i]; i++) {
switch (i) {
case 0:
d += f[0] (0.5);
break;
case 1:
d += f[1] (0.5);
break;
case 2:
d += f[2] (0.5);
break;
case 3:
d += f[3] (0.5);
break;
case 4:
d += f[4] (0.5);
break;
case 5:
d += f[5] (0.5);
break;
case 6:
d += f[6] (0.5);
break;
case 7:
d += f[7] (0.5);
break;
case 8:
d += f[8] (0.5);
break;
case 9:
d += f[9] (0.5);
break;
default:
break;
}
}
accum2 += d;
}
int main(void)
{
long i;
for (i = 0; i < 1000000; i++) {
arr();
poi();
swi();
}
printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
return 0;
}
given compiler flags:
/Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
"_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /FD /MT /Zp16 /Gy /Fp".\Release/test.pch"
/Fo".\Release/" /Fd".\Release/" /W3 /nologo /c /Zi /Gd /TP /errorReport:prompt
for compiler:
icrosoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86
opyright (C) Microsoft Corporation. All rights reserved.
Profile data:
Segment Offset Address Size Function Class Full Name Module Source File Timer
Samples Resolved Module Path Timer events Timer %
1 7536 11632 596 CItan_pentium4 CItan_pentium4 test.exe 156 c:\tmp\release\test.exe 156000 10.96275474
1 3904 8000 103 trandisp1 trandisp1 test.exe 149 c:\tmp\release\test.exe 149000 10.47083626
1 7019 11115 406 ctrandisp2 ctrandisp2 test.exe 104 c:\tmp\release\test.exe 104000 7.308503162
1 5488 9584 620 CIlog_pentium4 CIlog_pentium4 test.exe 101 c:\tmp\release\test.exe 101000 7.097680956
1 5056 9152 428 CIcos_pentium4 CIcos_pentium4 test.exe 100 c:\tmp\release\test.exe 100000 7.027406887
1 14384 18480 456 CIsin_pentium4 CIsin_pentium4 test.exe 86 c:\tmp\release\test.exe 86000 6.043569923
1 13024 17120 644 CIlog10_pentium4 CIlog10_pentium4 test.exe 85 c:\tmp\release\test.exe 85000 5.973295854
1 192 4288 408 swi swi test.exe c:\tmp\test.c 85 c:\tmp\release\test.exe 85000 5.973295854
1 7476 11572 60 fload fload test.exe 80 c:\tmp\release\test.exe 80000 5.621925509
1 1600 5696 63 tan tan test.exe 66 c:\tmp\release\test.exe 66000 4.638088545
1 13680 17776 701 CIexp_pentium4 CIexp_pentium4 test.exe 65 c:\tmp\release\test.exe 65000 4.567814476
1 7425 11521 51 ctrandisp1 ctrandisp1 test.exe 49 c:\tmp\release\test.exe 49000 3.443429375
1 2592 6688 59 exp exp test.exe 45 c:\tmp\release\test.exe 45000 3.162333099
1 1216 5312 63 log log test.exe 37 c:\tmp\release\test.exe 37000 2.600140548
1 720 4816 186 CIsqrt CIsqrt test.exe 35 c:\tmp\release\test.exe 35000 2.45959241
1 912 5008 63 cos cos test.exe 35 c:\tmp\release\test.exe 35000 2.45959241
1 0 4096 96 arr arr test.exe c:\tmp\test.c 29 c:\tmp\release\test.exe 29000 2.037947997
1 2256 6352 63 log10 log10 test.exe 28 c:\tmp\release\test.exe 28000 1.967673928
1 2736 6832 63 sin sin test.exe 28 c:\tmp\release\test.exe 28000 1.967673928
1 96 4192 92 poi poi test.exe c:\tmp\test.c 20 c:\tmp\release\test.exe 20000 1.405481377
1 4677 8773 67 fload_withFB fload_withFB test.exe 11 c:\tmp\release\test.exe 11000 0.773014758
1 1569 5665 7 tanh tanh test.exe 9 c:\tmp\release\test.exe 9000 0.63246662
1 1562 5658 7 cosh cosh test.exe 8 c:\tmp\release\test.exe 8000 0.562192551
1 1552 5648 10 sinh sinh test.exe 8 c:\tmp\release\test.exe 8000 0.562192551
1 4779 8875 42 math_exit math_exit test.exe 2 c:\tmp\release\test.exe 2000 0.140548138
1 608 4704 88 main main test.exe c:\tmp\test.c 2 c:\tmp\release\test.exe 2000 0.140548138
Switch is 5.97% of execution time.
Pointer is 1.4% of execution time.
4 times faster.
Of course, as always, YMMV.
Hardware was:
CPU Identification utility v1.11 (c) 1997-2005 Jan Steunebrink
„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ
CPU Vendor and Model: AMD Athlon 64 2800+-3700+
Internal CPU speed : 2199.4 MHz (using internal Time Stamp Counter)
Clock Multiplier : Available only in Real Mode!
CPU-ID Vendor string: AuthenticAMD
CPU-ID Name string : AMD Athlon(tm) 64 Processor 3400+
CPU-ID Signature : 0F4A
„ „ „ „¤„Ÿ Stepping or sub-model no.
„ „ „¤„Ÿ Model: Indicates CPU Model and 486 L1 cache mode
„ „¤„Ÿ Family: 4=486, Am5x86, Cx5x86
„ 5=Pentium, Nx586, Cx6x86, K5/K6, C6, mP6
„ 6=PentiumPro/II/III, CxMII/III, Athlon, C3
„ F=Pentium4, Athlon64
„¤„Ÿ Type: 0=Standard, 1=Overdrive, 2=2nd Dual Pentium
Current CPU mode : Protected
Internal (L1) cache : Enabled in Write-Back mode
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.