Author: Robert Hyatt
Date: 15:08:59 11/19/02
Go up one level in this thread
On November 19, 2002 at 17:11:33, Eugene Nalimov wrote:
>Sorry, probably too much details were lost during explanation. Here is simple
>example:
>
>int foo (int i)
>{
> switch (i) {
> case 10:
> i += 1;
> break;
> case 11:
> i += 2;
> break;
> case 12:
> i += 3;
> break;
> case 13:
> i += 4;
> break;
> case 14:
> i += 5;
> break;
> case 15:
> i += 6;
> break;
> case 16:
> i += 7;
> break;
> case 17:
> i += 8;
> break;
> }
> return bar(i*i);
>}
>
>And here is what compiler can generate for it (I compiled it with -O1 flag,
>optimizing for space, to simplify some things):
>
>_i$ = 8 ; size = 4
>_foo PROC NEAR ; COMDAT
> mov eax, DWORD PTR _i$[esp-4]
> lea ecx, DWORD PTR [eax-10]
> cmp ecx, 7
> ja SHORT $L473
> jmp DWORD PTR $L494[ecx*4]
>$L476:
> inc eax
> jmp SHORT $L473
>$L477:
> inc eax
> inc eax
> jmp SHORT $L473
>$L478:
> add eax, 3
> jmp SHORT $L473
>$L479:
> add eax, 4
> jmp SHORT $L473
>$L480:
> add eax, 5
> jmp SHORT $L473
>$L481:
> add eax, 6
> jmp SHORT $L473
>$L482:
> add eax, 7
> jmp SHORT $L473
>$L483:
> add eax, 8
>$L473:
> mov ecx, eax
> imul ecx, eax
> push ecx
> call _bar
> pop ecx
> ret 0
>$L494:
> DD $L476
> DD $L477
> DD $L478
> DD $L479
> DD $L480
> DD $L481
> DD $L482
> DD $L483
>_foo ENDP
>
>So, we have 2 branches -- one for the case when value is out of the range, and
>second is the indirect branch that is very heard to predict. On VC you can
>explain the compiler that there are no out-of-range values and get rid of the
>fisrt branch, but indirect branch is there to stay.
>
>If you are optimizing for speed and using profile feedback, sometimes compiler
>can get rid of the indirect branch for most frequent cases. I.e. if compiler
>knows that in 90% of cases i equals 12, than it can generate something like
>
>_i$ = 8 ; size = 4
>_foo PROC NEAR ; COMDAT
> mov eax, DWORD PTR _i$[esp-4]
> cmp eax, 12
> jne L1
> add eax, 3
>$L473:
> mov ecx, eax
> imul ecx, eax
> push ecx
> call _bar
> pop ecx
> ret 0
>L1:
> lea ecx, DWORD PTR [eax-10]
> cmp ecx, 7
> ja SHORT $L473
> jmp DWORD PTR $L494[ecx*4]
>$L476:
> inc eax
> jmp SHORT $L473
>$L477:
> add eax, 2
> jmp SHORT $L473
>$L478:
> add eax, 3
> jmp SHORT $L473
>$L479:
> add eax, 4
> jmp SHORT $L473
>$L480:
> add eax, 5
> jmp SHORT $L473
>$L481:
> add eax, 6
> jmp SHORT $L473
>$L482:
> add eax, 7
> jmp SHORT $L473
>$L483:
> add eax, 8
> jmp SHORT $L473
>$L494:
> DD $L476
> DD $L477
> DD 0
> DD $L479
> DD $L480
> DD $L481
> DD $L482
> DD $L483
>_foo ENDP
>
>Now for most often (and almost always predicted) case there is no branch
>misprediction, but for other cases there are 2 mispredictions. If we assume that
>misprediction penalty is 15 cycles, and we don't call function with out-of-range
>value, for the "naive" code switch cost is 5 instructions, one of them is
>mispredicted branch, so total is ~20 cycles. For the "profile-guided" code
>switch cost is 3 instructions (no mispredicted branches) in 90% of cases, and 7
>instructions (2 mispredicted branches) in 10% of cases, so total switch cost is
>0.9*3 + 0.1*37 == less than 7 cycles.
>
>Thanks,
>Eugene
Does it handle the case where it is often more efficient to do something, then
undo it
if necessary? IE the usual if-then-else case where you replace
if (c)
do x;
else
do y;
with
do y
if (c) {
undo y
do x
}
A lot of this seems awfully hard for the optimizer to handle, but I haven't
looked at MSVC
code in a long while. gcc is ok... but _only_ ok...
>
>
>On November 19, 2002 at 16:41:11, Robert Hyatt wrote:
>
>>On November 19, 2002 at 16:29:41, Eugene Nalimov wrote:
>>
>>>You don't need asm file to figure out what is going on. For a large switch table
>>>compiler generated jump table, and code should look like
>>> cmp eax, upper_switch_value+1
>>> jnc default_case
>>> mov eax, jump_table[eax*4]
>>> jmp eax
>>>Here indirect branch is not better than indirect function call. Plus, as Gunnar
>>>mentioned, there is some overhead.
>>
>>
>>What about a jump to a jump instead? That ought to be more predictable as the
>>second jump will _always_ be taken and to the same target address. IE the first
>>"switch" I saw did something like this:
>>
>>assuming ebx has the switch case (0,1,..., N-1)
>><check range>
>>lea eax, jumptable+ebx*4
>>jmp [eax]
>>
>>And jumptable looked like this:
>>
>>jumptable: jmp case0
>> jmp case1
>>
>>etc...
>>
>>That is loosely translated into x86 assembly as this was not an x86
>>architecture,
>>and yes, the ebx*4 is not quite the right number, but it was the idea rather
>>than
>>trying to adjust ebx by a non-power-of-2 and I chose the easy way for
>>illustration...
>>
>>the jmp [eax] can't be predicted, but the rest in the jump table can be.
>>
>>
>>
>>>
>>>Thanks,
>>>Eugene
>>>
>>>On November 19, 2002 at 14:40:50, Robert Hyatt wrote:
>>>
>>>>On November 19, 2002 at 04:41:59, Gunnar Andersson wrote:
>>>>
>>>>>On November 18, 2002 at 20:09:27, Eugene Nalimov wrote:
>>>>>
>>>>>>Indirect call costs *at least* the same as conditional branch, sometimes much
>>>>>>more.
>>>>>
>>>>>I've discovered that arrays of function pointers can outperform switch
>>>>>statements, at least with GCC and many cases. To me the generated assembly code
>>>>>for the switch statement contains a jump table anyway, and there's some extra
>>>>>assembly code essentially performing range checking.
>>>>>
>>>>>/ Gunnar
>>>>
>>>>
>>>>It would be interesting to look at the .s file from gcc to see what it is doing.
>>>> The
>>>>indirect calls should be bad. Of course the range checking in a switch is also
>>>>bad,
>>>>but I have seen compilers that would let me turn this off...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.