Author: Eugene Nalimov
Date: 14:11:33 11/19/02
Go up one level in this thread
Sorry, probably too much details were lost during explanation. Here is simple
example:
int foo (int i)
{
switch (i) {
case 10:
i += 1;
break;
case 11:
i += 2;
break;
case 12:
i += 3;
break;
case 13:
i += 4;
break;
case 14:
i += 5;
break;
case 15:
i += 6;
break;
case 16:
i += 7;
break;
case 17:
i += 8;
break;
}
return bar(i*i);
}
And here is what compiler can generate for it (I compiled it with -O1 flag,
optimizing for space, to simplify some things):
_i$ = 8 ; size = 4
_foo PROC NEAR ; COMDAT
mov eax, DWORD PTR _i$[esp-4]
lea ecx, DWORD PTR [eax-10]
cmp ecx, 7
ja SHORT $L473
jmp DWORD PTR $L494[ecx*4]
$L476:
inc eax
jmp SHORT $L473
$L477:
inc eax
inc eax
jmp SHORT $L473
$L478:
add eax, 3
jmp SHORT $L473
$L479:
add eax, 4
jmp SHORT $L473
$L480:
add eax, 5
jmp SHORT $L473
$L481:
add eax, 6
jmp SHORT $L473
$L482:
add eax, 7
jmp SHORT $L473
$L483:
add eax, 8
$L473:
mov ecx, eax
imul ecx, eax
push ecx
call _bar
pop ecx
ret 0
$L494:
DD $L476
DD $L477
DD $L478
DD $L479
DD $L480
DD $L481
DD $L482
DD $L483
_foo ENDP
So, we have 2 branches -- one for the case when value is out of the range, and
second is the indirect branch that is very heard to predict. On VC you can
explain the compiler that there are no out-of-range values and get rid of the
fisrt branch, but indirect branch is there to stay.
If you are optimizing for speed and using profile feedback, sometimes compiler
can get rid of the indirect branch for most frequent cases. I.e. if compiler
knows that in 90% of cases i equals 12, than it can generate something like
_i$ = 8 ; size = 4
_foo PROC NEAR ; COMDAT
mov eax, DWORD PTR _i$[esp-4]
cmp eax, 12
jne L1
add eax, 3
$L473:
mov ecx, eax
imul ecx, eax
push ecx
call _bar
pop ecx
ret 0
L1:
lea ecx, DWORD PTR [eax-10]
cmp ecx, 7
ja SHORT $L473
jmp DWORD PTR $L494[ecx*4]
$L476:
inc eax
jmp SHORT $L473
$L477:
add eax, 2
jmp SHORT $L473
$L478:
add eax, 3
jmp SHORT $L473
$L479:
add eax, 4
jmp SHORT $L473
$L480:
add eax, 5
jmp SHORT $L473
$L481:
add eax, 6
jmp SHORT $L473
$L482:
add eax, 7
jmp SHORT $L473
$L483:
add eax, 8
jmp SHORT $L473
$L494:
DD $L476
DD $L477
DD 0
DD $L479
DD $L480
DD $L481
DD $L482
DD $L483
_foo ENDP
Now for most often (and almost always predicted) case there is no branch
misprediction, but for other cases there are 2 mispredictions. If we assume that
misprediction penalty is 15 cycles, and we don't call function with out-of-range
value, for the "naive" code switch cost is 5 instructions, one of them is
mispredicted branch, so total is ~20 cycles. For the "profile-guided" code
switch cost is 3 instructions (no mispredicted branches) in 90% of cases, and 7
instructions (2 mispredicted branches) in 10% of cases, so total switch cost is
0.9*3 + 0.1*37 == less than 7 cycles.
Thanks,
Eugene
On November 19, 2002 at 16:41:11, Robert Hyatt wrote:
>On November 19, 2002 at 16:29:41, Eugene Nalimov wrote:
>
>>You don't need asm file to figure out what is going on. For a large switch table
>>compiler generated jump table, and code should look like
>> cmp eax, upper_switch_value+1
>> jnc default_case
>> mov eax, jump_table[eax*4]
>> jmp eax
>>Here indirect branch is not better than indirect function call. Plus, as Gunnar
>>mentioned, there is some overhead.
>
>
>What about a jump to a jump instead? That ought to be more predictable as the
>second jump will _always_ be taken and to the same target address. IE the first
>"switch" I saw did something like this:
>
>assuming ebx has the switch case (0,1,..., N-1)
><check range>
>lea eax, jumptable+ebx*4
>jmp [eax]
>
>And jumptable looked like this:
>
>jumptable: jmp case0
> jmp case1
>
>etc...
>
>That is loosely translated into x86 assembly as this was not an x86
>architecture,
>and yes, the ebx*4 is not quite the right number, but it was the idea rather
>than
>trying to adjust ebx by a non-power-of-2 and I chose the easy way for
>illustration...
>
>the jmp [eax] can't be predicted, but the rest in the jump table can be.
>
>
>
>>
>>Thanks,
>>Eugene
>>
>>On November 19, 2002 at 14:40:50, Robert Hyatt wrote:
>>
>>>On November 19, 2002 at 04:41:59, Gunnar Andersson wrote:
>>>
>>>>On November 18, 2002 at 20:09:27, Eugene Nalimov wrote:
>>>>
>>>>>Indirect call costs *at least* the same as conditional branch, sometimes much
>>>>>more.
>>>>
>>>>I've discovered that arrays of function pointers can outperform switch
>>>>statements, at least with GCC and many cases. To me the generated assembly code
>>>>for the switch statement contains a jump table anyway, and there's some extra
>>>>assembly code essentially performing range checking.
>>>>
>>>>/ Gunnar
>>>
>>>
>>>It would be interesting to look at the .s file from gcc to see what it is doing.
>>> The
>>>indirect calls should be bad. Of course the range checking in a switch is also
>>>bad,
>>>but I have seen compilers that would let me turn this off...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.