Author: Gerd Isenberg
Date: 11:38:28 11/18/05
Go up one level in this thread
I am a bit confused about this statement in the amd64 guide:
"To avoid a comparison chain and its undesirable effects on branch prediction,
replace the switch statement with a series of if-else statements..."
I thought a series of if-else statements _is_ a comparison chain?
Cheers,
Gerd
--------------------------------------------------------------------------
How to optimize for the Pentium family of microprocessors By Agner Fog, Ph.D.
Copyright © 1996 - 2004. Last updated 2004-04-16.
12.3 Branch prediction in PMMX, PPro, P2, and P3 pg 43
....
Indirect jumps and calls (PMMX, PPro, P2 and P3)
There is no pattern recognition for indirect jumps and calls, and the BTB can
remember no more than one target for an indirect jump. It is simply predicted to
go to the same target as it did last time.
...
12.4 Branch prediction in P4 pg 45
...
Branch mispredictions are much more expensive on the P4 than on previous
generations of microprocessors. The time it takes to recover from a
misprediction is rarely less than 24 clock cycles, and typically 45 uops.
Apparently, the microprocessor cannot cancel a bogus uop before it has reached
the retirement stage. This means that if you have a lot of uops with long
latency or poor throughput, then the penalty for a misprediction may be as high
as 100 clock cycles or more. It is therefore very important to organize code so
that the number of mispredictions is minimized.
12.5 Indirect jumps (all processors) pg 47
While an unconditional jump always goes to the same target, indirect jumps,
indirect calls, and returns may go to a different address each time. The
prediction method for an indirect jump or indirect call is, in all processors,
simply to predict that it will go to the same target as last time it was
executed. The first time an indirect jump or indirect call is seen, it is
predicted to go to the immediately following instruction. Therefore, an indirect
jump or call should always be followed by valid code. Don't place a list of jump
addresses immediately after an indirect jump or call. Such a list should
preferably be placed in the data segment, rather than the code segment.
Multiway branches (switch/case statements) are implemented either as an indirect
jump using a list of jump addresses, or as a tree of branch instructions. Since
indirect jumps are poorly predicted, the latter method may be preferred if
easily predicted patterns can be expected and you have enough BTB entries.
...
--------------------------------------------------------------------------
Understanding the detailed Architecture of AMD's 64 bit Core
by Hans de Vries
4.12 Instruction Cache Hit / Miss detection, The Current Page and BTAC
...
The BTAC does not help in case of indirect branches. These still have to wait
until the correct address becomes available from the retired branch instruction.
--------------------------------------------------------------------------
Software Optimization Guide for AMD64 Processors
2.10 SWITCH and Noncontiguous Case Expressions
Optimization
Use if-else statements in place of switch statements that have noncontiguous
case expressions. (Case expressions are the individual expressions to which the
single switch expression is compared.)
...
Rationale
If the case expressions are contiguous or nearly contiguous integer values, most
compilers translate the switch statement as a jump table instead of a comparison
chain. Jump tables generally improve performance because:
• They reduce the number of branches to a single procedure call.
• The size of the control-flow code is the same no matter how many cases there
are.
• The amount of control-flow code that the processor must execute is the same
for all values of the switch expression.
However, if the case expressions are noncontiguous values, most compilers
translate the switch statement as a comparison chain. Comparison chains are
undesirable because:
• They use dense sequences of conditional branches, which interfere with the
processor’s ability to successfully perform branch prediction.
• The size of the control-flow code increases with the number of cases.
• The amount of control-flow code that the processor must execute varies with
the value of the switch expression.
....
Example 2
Because the case expressions in the following switch statement are not
contiguous values, the compiler will likely translate the code into a comparison
chain instead of a jump table:
switch (a)
{
case 8:
// Sequence for a==8
break;
case 16:
// Sequence for a==16
break;
...
default:
// Default sequence
break;
}
To avoid a comparison chain and its undesirable effects on branch prediction,
replace the switch statement with a series of if-else statements, as follows:
if (a==8) {
// Sequence for a==8
}
else if (a==16) {
// Sequence for a==16
}
...
else {
// Default sequence
}
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.