Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Simple optimization question

Author: Gerd Isenberg

Date: 15:23:33 01/09/04

Go up one level in this thread


On January 09, 2004 at 17:26:42, Dezhi Zhao wrote:

>On January 09, 2004 at 10:05:27, Gerd Isenberg wrote:
>
>>On January 09, 2004 at 09:44:03, Robert Hyatt wrote:
>>
>>>On January 09, 2004 at 09:01:23, Gerd Isenberg wrote:
>>>
>>>>On January 09, 2004 at 06:46:00, Tord Romstad wrote:
>>>>
>>>>>By reading this forum, I've understood that "if" statements are considered
>>>>>evil and that it is often a good idea to remove them if it is possible.  Suppose
>>>>>that I have code which looks like this:
>>>>>
>>>>>if(x) y += 20;
>>>>>
>>>>>Would it then be advantageous to rewrite the code like this?
>>>>>
>>>>>y += (!(!x))*20;
>>>>>
>>>>>In my evaluation function, I have a lot of conditionals which could be avoided
>>>>>by
>>>>>using tricks similar to the one above, but before doing it I would like to make
>>>>>sure it is really a good idea.  After all, the first form above is much more
>>>>>readable.
>>>>>
>>>>>Tord
>>>>
>>>>Hi Tord,
>>>>
>>>>in general it is a good idea to avoid branches with todays super pipelined
>>>>processors, specially if the branch-body is small and the condition is "random"
>>>>and difficult to predict for the processor.
>>>>
>>>>The drawback with y += (x!=0)*K is the need of an additional register and more
>>>>instructions, so it only pays off, if the register pressure is rather low, the
>>>>condition is random and the target is already loaded inside a register:
>>>>
>>>>if (eax > ebx) ecx += 20;
>>>>
>>>>   cmp  eax, ebx
>>>>   jle  l1
>>>>   add  ecx, 20
>>>>l1:
>>>>
>>>>ecx += (eax > ebx) * 20;
>>>>
>>>>   xor  edx, edx ; zero edx, because set instruction use byte register only
>>>>   cmp  eax, ebx
>>>>   setg dl       ; edx := (eax > ebx)
>>>>   shl  edx, 2   ; * 4
>>>>   lea  edx,[edx+edx*4] ; * 5
>>>>   add  ecx, edx
>>>>
>>>>Depending on the constant, the multiplication may done by shift,add,lea
>>>>instructions, but with some constants (or even variables) it is faster to avoid
>>>>the "mul" and to use "and" with a boolean mask (-true=>0xffffffff,-false=>0):
>>>>
>>>>y += -(x != 0) & z;
>>>
>>>Won't _that_ have a branch in it as well?  Or else at least a long
>>>pipeline stall where you want the result of the comparison before you
>>>can use it (ie some sort of setxx instruction as above)?
>>
>>With simple 32-bit int conditions i found it always branchless with set
>>instruction (MSVC6). With register dependencies you are right, but probably
>>less expensive as 50% miss predictions, specially considering out of order
>>execution of other instructions.
>>
>
>one more point here. Nowadays the adc, sbb instructions that are often used to
>impleament the trick are quite slow.  setcond is is not too bad, but not as good
>as before.  And of course, the trick often incress register pressure.
>So the bottom line is to do a test in your application:)

Sure, but why are adc/sbb slow? On P4? Because of flag dependency?
AMD reports one cycle latency like add/sub. My compiler produces a setCC
variation:

int foo(int a, int b, int c)
{
	if ( a != 0 ) b += c;
	b += ((a == 0)-1) & c;

	__asm // for code size reasons
	{
		xor  edx, edx
		cmp  ebx, 1   ; carry (borrow) := (eax == 0)
		adc  edx, -1   ; edx := (eax == 0)-1
		and  edx, ecx
		add  eax, edx ; +=
	}
	return b;
}


; Function compile flags: /Ogtay
_TEXT	SEGMENT
_a$ = 8
_b$ = 12
_c$ = 16
?foo@@YAHHHH@Z PROC NEAR				; foo
; Line 1481
  01b60	8b 4c 24 04	 mov	 ecx, DWORD PTR _a$[esp-4]
  01b64	85 c9		 test	 ecx, ecx
  01b66	56		 push	 esi
  01b67	8b 74 24 10	 mov	 esi, DWORD PTR _c$[esp]
  01b6b	74 04		 je	 SHORT $L43006
  01b6d	01 74 24 0c	 add	 DWORD PTR _b$[esp], esi
$L43006:
  01b71	33 d2		 xor	 edx, edx
  01b73	83 fb 01	 cmp	 ebx, 1
  01b76	83 d2 ff	 adc	 edx, -1
  01b79	23 d1		 and	 edx, ecx
  01b7b	03 c2		 add	 eax, edx
; Line 1484
  01b7d	33 c0		 xor	 eax, eax
  01b7f	85 c9		 test	 ecx, ecx
  01b81	8b 4c 24 0c	 mov	 ecx, DWORD PTR _b$[esp]
  01b85	0f 94 c0	 sete	 al
  01b88	48		 dec	 eax
  01b89	23 c6		 and	 eax, esi
  01b8b	03 c1		 add	 eax, ecx
  01b8d	5e		 pop	 esi
; Line 1495
  01b8e	c3		 ret	 0
?foo@@YAHHHH@Z ENDP					; foo



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.