Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Avoiding if ( expr ) and branch misprediction ?

Author: Gerd Isenberg

Date: 11:26:11 01/19/06

Go up one level in this thread


On January 19, 2006 at 12:45:10, Chan Rasjid wrote:

>
>When using bitboards, often we have arrays that require only indices 0 / 1.
>eg if we have no underpromotions, then there is either 1/2 knights per color.
>Sometimes ,in such situations, codes may be re-written without using the if ()
>statement. But whether such simple tricks can avoid the cost of branch
>misprediction ( I only have a vague idea about this ).
>
>X[i] where i = 0 / 1;
>
>1) if ( a & b ){
>     j = X[1];
>   }else{
>     j = X[0];
>   }
>   //etc.. codes dependent on j.
>
>2) Now without if() :-
>
>   j = X[(a & b)!= 0];
>   //etc.. codes dependent on j.

Hi Rasjid,

whether a boolean statement works without branches depends on the compiler and
architecture. Recent x86 compiler should do at least 2) branchless, using a
setnz-instruction for (a & b)!= 0, for instance:

xor   edx, edx ; clear dword
test  rax, rbx ; get the zero flag
setnz dl       ; edx := (a & b) ? 1 : 0
mov   edx, X[edx*4] ; assuming X is int array

>
>How (if ? ) can such a simple re-coding cause an improvement. Can there
>be some situations related to the above with significanr misprediction
>overheads ?

If the pattern is hard to predict and/or you have so many branches, that the
branch target buffer is likely to get trashed, i think it is a good idea to
avoid conditional jumps. Specially with very small conditional bodies in small,
but often used inlined functions or macros.

With pattern like 99% of the time 00010001... (for instance 0=then case taken,
1=else case taken) conditional jumps are fine. You may try to avoid else cases
if possible as well - or leave that and possible branchless tricks to the
"pogoable" compiler.

While x86 branch replace instructions like setcc, cmovcc, sbb and adc are flag
dependent, they don't allow a more dense instruction scheduling for multiple
conditional assignments. That is not true for "sign flag" dependency after a
subtraction, where the sign flag is also present as MSB inside a register. So
one may use logical shift right 31 to get boolean {0,1}-values or arithmetical
shift right 31 (or cdq) for {0,-1}-masks, if you are aware of the range of the
signed or unsigned values to subtract, for instance:

if ( a > b ) x += 7;
if ( c > d ) x += 5;

// use < 0 expressions - considering ranges!
// this unsafe optimization can not be done by the compiler!
if ( (signed)(b-a) < 0 ) x += 7;
if ( (signed)(d-c) < 0 ) x += 5;

// the branchless approach with
// some prospects of parallel work
x += ((signed)(b-a)>>31) & 7;
x += ((signed)(d-c)>>31) & 5;

possible assembly:

sub   ebx, eax ; b-a
sub   edx, ecx ; d-c
sar   ebx, 31
sar   edx, 31
and   ebx, 7
and   edx, 5
add   r08, ebx ; x += ((signed)(b-a)>>31) & 7
add   r08, edx ; x += ((signed)(d-c)>>31) & 5

Pairs quite well, but P4 sucks due to slow shift.
On amd that will take about the same number of cycles as one sequential
conditional assignment. If processing vectors of this stuff, sse2 comes in mind.

Gerd

>
>Thanks
>Rasjid



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.