Author: Gerd Isenberg
Date: 14:11:04 01/19/06
Go up one level in this thread
On January 19, 2006 at 15:58:20, Dann Corbit wrote:
>On January 19, 2006 at 15:19:22, Gerd Isenberg wrote:
>
>>>>>assert(expression == 0 || expression == 1);
>>>>>foo[expression]();
>>>>
>>>>No - avoid that for simple expressions!
>>>
>>>You might be surprised:
>>>http://chessprogramming.org/cccsearch/ccc.php?art_id=178068
>>>
>>>Depends on the compiler, of course.
>>
>>Ok, a few years old - it was quite ok on P3 iirc.
>>To my knowledge on recent processors (amd64) indirect branches with (not
>>necessarily random) changed target addresses implies a missprediction.
>>Also i would never replace a simple array access by indrect branch.
>>
>> a = x[boolexpression];
>> a = foo[boolexpression](); // returns either x[false] or x[true]
>
>The program:
>#include <math.h>
>#include <stdio.h>
>
>typedef double (*f_t) (double);
>static f_t f[] =
>{log, log10, sqrt, cos, cosh, exp, sin, sinh, tan, tanh, 0};
>
>static double accum0 = 0;
>static double accum1 = 0;
>static double accum2 = 0;
>
>
>void arr(void)
>{
> int i;
> double d = 0;
> for (i = 0; f[i]; i++) {
> d += f[i] (0.5);
> }
> accum0 += d;
>}
>
>void poi(void)
>{
> f_t *flist = f;
> double d = 0;
> while (*flist) {
> f_t ff = *flist;
> d += ff(0.5);
> flist++;
> }
> accum1 += d;
>}
>
>void swi(void)
>{
> int i;
> double d = 0;
> for (i = 0; f[i]; i++) {
> switch (i) {
> case 0:
> d += f[0] (0.5);
> break;
> case 1:
> d += f[1] (0.5);
> break;
> case 2:
> d += f[2] (0.5);
> break;
> case 3:
> d += f[3] (0.5);
> break;
> case 4:
> d += f[4] (0.5);
> break;
> case 5:
> d += f[5] (0.5);
> break;
> case 6:
> d += f[6] (0.5);
> break;
> case 7:
> d += f[7] (0.5);
> break;
> case 8:
> d += f[8] (0.5);
> break;
> case 9:
> d += f[9] (0.5);
> break;
> default:
> break;
> }
> }
> accum2 += d;
>}
>
>int main(void)
>{
> long i;
> for (i = 0; i < 1000000; i++) {
> arr();
> poi();
> swi();
> }
> printf("%.20g, %.20g, %.20g\n", accum0, accum1, accum2);
> return 0;
>}
>
>given compiler flags:
>/Ox /Ob2 /Oi /Ot /Oy /GT /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D
>"_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /FD /MT /Zp16 /Gy /Fp".\Release/test.pch"
>/Fo".\Release/" /Fd".\Release/" /W3 /nologo /c /Zi /Gd /TP /errorReport:prompt
>
>for compiler:
>icrosoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86
>opyright (C) Microsoft Corporation. All rights reserved.
>
>Profile data:
>Segment Offset Address Size Function Class Full Name Module Source File Timer
>Samples Resolved Module Path Timer events Timer %
>1 7536 11632 596 CItan_pentium4 CItan_pentium4 test.exe 156 c:\tmp\release\test.exe 156000 10.96275474
>1 3904 8000 103 trandisp1 trandisp1 test.exe 149 c:\tmp\release\test.exe 149000 10.47083626
>1 7019 11115 406 ctrandisp2 ctrandisp2 test.exe 104 c:\tmp\release\test.exe 104000 7.308503162
>1 5488 9584 620 CIlog_pentium4 CIlog_pentium4 test.exe 101 c:\tmp\release\test.exe 101000 7.097680956
>1 5056 9152 428 CIcos_pentium4 CIcos_pentium4 test.exe 100 c:\tmp\release\test.exe 100000 7.027406887
>1 14384 18480 456 CIsin_pentium4 CIsin_pentium4 test.exe 86 c:\tmp\release\test.exe 86000 6.043569923
>1 13024 17120 644 CIlog10_pentium4 CIlog10_pentium4 test.exe 85 c:\tmp\release\test.exe 85000 5.973295854
>1 192 4288 408 swi swi test.exe c:\tmp\test.c 85 c:\tmp\release\test.exe 85000 5.973295854
>1 7476 11572 60 fload fload test.exe 80 c:\tmp\release\test.exe 80000 5.621925509
>1 1600 5696 63 tan tan test.exe 66 c:\tmp\release\test.exe 66000 4.638088545
>1 13680 17776 701 CIexp_pentium4 CIexp_pentium4 test.exe 65 c:\tmp\release\test.exe 65000 4.567814476
>1 7425 11521 51 ctrandisp1 ctrandisp1 test.exe 49 c:\tmp\release\test.exe 49000 3.443429375
>1 2592 6688 59 exp exp test.exe 45 c:\tmp\release\test.exe 45000 3.162333099
>1 1216 5312 63 log log test.exe 37 c:\tmp\release\test.exe 37000 2.600140548
>1 720 4816 186 CIsqrt CIsqrt test.exe 35 c:\tmp\release\test.exe 35000 2.45959241
>1 912 5008 63 cos cos test.exe 35 c:\tmp\release\test.exe 35000 2.45959241
>1 0 4096 96 arr arr test.exe c:\tmp\test.c 29 c:\tmp\release\test.exe 29000 2.037947997
>1 2256 6352 63 log10 log10 test.exe 28 c:\tmp\release\test.exe 28000 1.967673928
>1 2736 6832 63 sin sin test.exe 28 c:\tmp\release\test.exe 28000 1.967673928
>1 96 4192 92 poi poi test.exe c:\tmp\test.c 20 c:\tmp\release\test.exe 20000 1.405481377
>1 4677 8773 67 fload_withFB fload_withFB test.exe 11 c:\tmp\release\test.exe 11000 0.773014758
>1 1569 5665 7 tanh tanh test.exe 9 c:\tmp\release\test.exe 9000 0.63246662
>1 1562 5658 7 cosh cosh test.exe 8 c:\tmp\release\test.exe 8000 0.562192551
>1 1552 5648 10 sinh sinh test.exe 8 c:\tmp\release\test.exe 8000 0.562192551
>1 4779 8875 42 math_exit math_exit test.exe 2 c:\tmp\release\test.exe 2000 0.140548138
>1 608 4704 88 main main test.exe c:\tmp\test.c 2 c:\tmp\release\test.exe 2000 0.140548138
>
>Switch is 5.97% of execution time.
>Pointer is 1.4% of execution time.
This has nothing to do with getting preinitialized data from arrays.
It is switch aka indirect jump verus indirect call.
You may try an explicite if-else-else cascade ;-)
>
>4 times faster.
Strange!? Switch is most likely conditional jmp with a leading range check. So i
don't trust that profiling so much with soo short routines amd probably huge
"sampling" errors (i guess). Also the return overhead "pairs" with the x87
instructions, which favors the call a bit.
>
>Of course, as always, YMMV.
>
>Hardware was:
> CPU Identification utility v1.11 (c) 1997-2005 Jan Steunebrink
>
>„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ„Ÿ
> CPU Vendor and Model: AMD Athlon 64 2800+-3700+
> Internal CPU speed : 2199.4 MHz (using internal Time Stamp Counter)
> Clock Multiplier : Available only in Real Mode!
> CPU-ID Vendor string: AuthenticAMD
> CPU-ID Name string : AMD Athlon(tm) 64 Processor 3400+
> CPU-ID Signature : 0F4A
> „ „ „ „¤„Ÿ Stepping or sub-model no.
> „ „ „¤„Ÿ Model: Indicates CPU Model and 486 L1 cache mode
> „ „¤„Ÿ Family: 4=486, Am5x86, Cx5x86
> „ 5=Pentium, Nx586, Cx6x86, K5/K6, C6, mP6
> „ 6=PentiumPro/II/III, CxMII/III, Athlon, C3
> „ F=Pentium4, Athlon64
> „¤„Ÿ Type: 0=Standard, 1=Overdrive, 2=2nd Dual Pentium
> Current CPU mode : Protected
> Internal (L1) cache : Enabled in Write-Back mode
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.