Author: Robert Hyatt
Date: 11:46:13 01/19/06
Go up one level in this thread
On January 19, 2006 at 14:26:11, Gerd Isenberg wrote:
>On January 19, 2006 at 12:45:10, Chan Rasjid wrote:
>
>>
>>When using bitboards, often we have arrays that require only indices 0 / 1.
>>eg if we have no underpromotions, then there is either 1/2 knights per color.
>>Sometimes ,in such situations, codes may be re-written without using the if ()
>>statement. But whether such simple tricks can avoid the cost of branch
>>misprediction ( I only have a vague idea about this ).
>>
>>X[i] where i = 0 / 1;
>>
>>1) if ( a & b ){
>> j = X[1];
>> }else{
>> j = X[0];
>> }
>> //etc.. codes dependent on j.
>>
>>2) Now without if() :-
>>
>> j = X[(a & b)!= 0];
>> //etc.. codes dependent on j.
>
>Hi Rasjid,
>
>whether a boolean statement works without branches depends on the compiler and
>architecture. Recent x86 compiler should do at least 2) branchless, using a
>setnz-instruction for (a & b)!= 0, for instance:
>
>xor edx, edx ; clear dword
>test rax, rbx ; get the zero flag
>setnz dl ; edx := (a & b) ? 1 : 0
>mov edx, X[edx*4] ; assuming X is int array
>
>>
>>How (if ? ) can such a simple re-coding cause an improvement. Can there
>>be some situations related to the above with significanr misprediction
>>overheads ?
>
>If the pattern is hard to predict and/or you have so many branches, that the
>branch target buffer is likely to get trashed, i think it is a good idea to
>avoid conditional jumps. Specially with very small conditional bodies in small,
>but often used inlined functions or macros.
One note. A good branch predictor (I have not looked at PIV carefully) has the
ability to do both "correlated and non-correlated prediction" with a tournament
predictor. The idea is that if you have lots of if (wtm) type things, they are
all "correlated". If one is taken, all are taken, and vice-versa. That can be
handled (Dec did this on the alpha).
>
>With pattern like 99% of the time 00010001... (for instance 0=then case taken,
>1=else case taken) conditional jumps are fine. You may try to avoid else cases
>if possible as well - or leave that and possible branchless tricks to the
>"pogoable" compiler.
>
>While x86 branch replace instructions like setcc, cmovcc, sbb and adc are flag
>dependent, they don't allow a more dense instruction scheduling for multiple
>conditional assignments. That is not true for "sign flag" dependency after a
>subtraction, where the sign flag is also present as MSB inside a register. So
>one may use logical shift right 31 to get boolean {0,1}-values or arithmetical
>shift right 31 (or cdq) for {0,-1}-masks, if you are aware of the range of the
>signed or unsigned values to subtract, for instance:
>
>if ( a > b ) x += 7;
>if ( c > d ) x += 5;
>
>// use < 0 expressions - considering ranges!
>// this unsafe optimization can not be done by the compiler!
>if ( (signed)(b-a) < 0 ) x += 7;
>if ( (signed)(d-c) < 0 ) x += 5;
>
>// the branchless approach with
>// some prospects of parallel work
>x += ((signed)(b-a)>>31) & 7;
>x += ((signed)(d-c)>>31) & 5;
>
>possible assembly:
>
>sub ebx, eax ; b-a
>sub edx, ecx ; d-c
>sar ebx, 31
>sar edx, 31
>and ebx, 7
>and edx, 5
>add r08, ebx ; x += ((signed)(b-a)>>31) & 7
>add r08, edx ; x += ((signed)(d-c)>>31) & 5
>
>Pairs quite well, but P4 sucks due to slow shift.
>On amd that will take about the same number of cycles as one sequential
>conditional assignment. If processing vectors of this stuff, sse2 comes in mind.
>
>Gerd
>
I like the fact that you are now a "rax" person. :) Of course, don't try that
at home on your old X86 system. I'm waiting on our new intel X86-64 cluster to
"come alive" to see how intel's latest X86-64 stuff actually works. Will
provide some data before long I hope...
>>
>>Thanks
>>Rasjid
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.