Author: Dann Corbit
Date: 13:16:52 11/18/05
Go up one level in this thread
On November 18, 2005 at 14:38:28, Gerd Isenberg wrote:
>I am a bit confused about this statement in the amd64 guide:
>
>"To avoid a comparison chain and its undesirable effects on branch prediction,
>replace the switch statement with a series of if-else statements..."
>
>I thought a series of if-else statements _is_ a comparison chain?
>
>Cheers,
>Gerd
>
>--------------------------------------------------------------------------
>
>How to optimize for the Pentium family of microprocessors By Agner Fog, Ph.D.
>Copyright © 1996 - 2004. Last updated 2004-04-16.
>
>12.3 Branch prediction in PMMX, PPro, P2, and P3 pg 43
>....
>Indirect jumps and calls (PMMX, PPro, P2 and P3)
>There is no pattern recognition for indirect jumps and calls, and the BTB can
>remember no more than one target for an indirect jump. It is simply predicted to
>go to the same target as it did last time.
>...
>
>12.4 Branch prediction in P4 pg 45
>...
>Branch mispredictions are much more expensive on the P4 than on previous
>generations of microprocessors. The time it takes to recover from a
>misprediction is rarely less than 24 clock cycles, and typically 45 uops.
>Apparently, the microprocessor cannot cancel a bogus uop before it has reached
>the retirement stage. This means that if you have a lot of uops with long
>latency or poor throughput, then the penalty for a misprediction may be as high
>as 100 clock cycles or more. It is therefore very important to organize code so
>that the number of mispredictions is minimized.
>
>12.5 Indirect jumps (all processors) pg 47
>
>While an unconditional jump always goes to the same target, indirect jumps,
>indirect calls, and returns may go to a different address each time. The
>prediction method for an indirect jump or indirect call is, in all processors,
>simply to predict that it will go to the same target as last time it was
>executed. The first time an indirect jump or indirect call is seen, it is
>predicted to go to the immediately following instruction. Therefore, an indirect
>jump or call should always be followed by valid code. Don't place a list of jump
>addresses immediately after an indirect jump or call. Such a list should
>preferably be placed in the data segment, rather than the code segment.
>
>Multiway branches (switch/case statements) are implemented either as an indirect
>jump using a list of jump addresses, or as a tree of branch instructions. Since
>indirect jumps are poorly predicted, the latter method may be preferred if
>easily predicted patterns can be expected and you have enough BTB entries.
>...
>
>--------------------------------------------------------------------------
>
>Understanding the detailed Architecture of AMD's 64 bit Core
>by Hans de Vries
>
>4.12 Instruction Cache Hit / Miss detection, The Current Page and BTAC
>...
>The BTAC does not help in case of indirect branches. These still have to wait
>until the correct address becomes available from the retired branch instruction.
>
>--------------------------------------------------------------------------
>
>Software Optimization Guide for AMD64 Processors
>
>2.10 SWITCH and Noncontiguous Case Expressions
>
>Optimization
>Use if-else statements in place of switch statements that have noncontiguous
>case expressions. (Case expressions are the individual expressions to which the
>single switch expression is compared.)
>...
>
>Rationale
>If the case expressions are contiguous or nearly contiguous integer values, most
>compilers translate the switch statement as a jump table instead of a comparison
>chain. Jump tables generally improve performance because:
>• They reduce the number of branches to a single procedure call.
>• The size of the control-flow code is the same no matter how many cases there
> are.
>• The amount of control-flow code that the processor must execute is the same
> for all values of the switch expression.
>
>However, if the case expressions are noncontiguous values, most compilers
>translate the switch statement as a comparison chain. Comparison chains are
>undesirable because:
>• They use dense sequences of conditional branches, which interfere with the
> processor’s ability to successfully perform branch prediction.
>• The size of the control-flow code increases with the number of cases.
>• The amount of control-flow code that the processor must execute varies with
> the value of the switch expression.
>
>....
>Example 2
>Because the case expressions in the following switch statement are not
>contiguous values, the compiler will likely translate the code into a comparison
>chain instead of a jump table:
>
>switch (a)
>{
> case 8:
> // Sequence for a==8
> break;
> case 16:
> // Sequence for a==16
> break;
> ...
> default:
> // Default sequence
> break;
>}
>
>To avoid a comparison chain and its undesirable effects on branch prediction,
>replace the switch statement with a series of if-else statements, as follows:
>
>if (a==8) {
> // Sequence for a==8
>}
>else if (a==16) {
> // Sequence for a==16
>}
>...
>else {
> // Default sequence
>}
The compiler is free to do the replacement for you.
Never micro-optimize by changing the construct from the most sensible one to one
less sensible but which you might think to be faster.[*]
IMO-YMMV
[*] Unless you have benchmarked the code, it is a hot-spot, it causes the code
to fall below specification, and the modified code fixes the problem.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.