Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Simple optimization question

Author: Robert Hyatt

Date: 18:41:37 01/09/04

Go up one level in this thread


On January 09, 2004 at 18:23:33, Gerd Isenberg wrote:

>On January 09, 2004 at 17:26:42, Dezhi Zhao wrote:
>
>>On January 09, 2004 at 10:05:27, Gerd Isenberg wrote:
>>
>>>On January 09, 2004 at 09:44:03, Robert Hyatt wrote:
>>>
>>>>On January 09, 2004 at 09:01:23, Gerd Isenberg wrote:
>>>>
>>>>>On January 09, 2004 at 06:46:00, Tord Romstad wrote:
>>>>>
>>>>>>By reading this forum, I've understood that "if" statements are considered
>>>>>>evil and that it is often a good idea to remove them if it is possible.  Suppose
>>>>>>that I have code which looks like this:
>>>>>>
>>>>>>if(x) y += 20;
>>>>>>
>>>>>>Would it then be advantageous to rewrite the code like this?
>>>>>>
>>>>>>y += (!(!x))*20;
>>>>>>
>>>>>>In my evaluation function, I have a lot of conditionals which could be avoided
>>>>>>by
>>>>>>using tricks similar to the one above, but before doing it I would like to make
>>>>>>sure it is really a good idea.  After all, the first form above is much more
>>>>>>readable.
>>>>>>
>>>>>>Tord
>>>>>
>>>>>Hi Tord,
>>>>>
>>>>>in general it is a good idea to avoid branches with todays super pipelined
>>>>>processors, specially if the branch-body is small and the condition is "random"
>>>>>and difficult to predict for the processor.
>>>>>
>>>>>The drawback with y += (x!=0)*K is the need of an additional register and more
>>>>>instructions, so it only pays off, if the register pressure is rather low, the
>>>>>condition is random and the target is already loaded inside a register:
>>>>>
>>>>>if (eax > ebx) ecx += 20;
>>>>>
>>>>>   cmp  eax, ebx
>>>>>   jle  l1
>>>>>   add  ecx, 20
>>>>>l1:
>>>>>
>>>>>ecx += (eax > ebx) * 20;
>>>>>
>>>>>   xor  edx, edx ; zero edx, because set instruction use byte register only
>>>>>   cmp  eax, ebx
>>>>>   setg dl       ; edx := (eax > ebx)
>>>>>   shl  edx, 2   ; * 4
>>>>>   lea  edx,[edx+edx*4] ; * 5
>>>>>   add  ecx, edx
>>>>>
>>>>>Depending on the constant, the multiplication may done by shift,add,lea
>>>>>instructions, but with some constants (or even variables) it is faster to avoid
>>>>>the "mul" and to use "and" with a boolean mask (-true=>0xffffffff,-false=>0):
>>>>>
>>>>>y += -(x != 0) & z;
>>>>
>>>>Won't _that_ have a branch in it as well?  Or else at least a long
>>>>pipeline stall where you want the result of the comparison before you
>>>>can use it (ie some sort of setxx instruction as above)?
>>>
>>>With simple 32-bit int conditions i found it always branchless with set
>>>instruction (MSVC6). With register dependencies you are right, but probably
>>>less expensive as 50% miss predictions, specially considering out of order
>>>execution of other instructions.
>>>
>>
>>one more point here. Nowadays the adc, sbb instructions that are often used to
>>impleament the trick are quite slow.  setcond is is not too bad, but not as good
>>as before.  And of course, the trick often incress register pressure.
>>So the bottom line is to do a test in your application:)
>
>Sure, but why are adc/sbb slow? On P4? Because of flag dependency?
>AMD reports one cycle latency like add/sub. My compiler produces a setCC
>variation:
>

What worries me about the above is the long pipeline on the PIV.  The flag
is not written to until near the end, but you need the flag for the setxx
instruction way early.  More than one microprocessor chose to not use
flags/condition-codes for this very reason...

It might slide by with register renaming, since I suppose the flag register
_could_ be included (I have not tried to study this to be sure)...





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.