Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Fruit 2.0 Toga : Recapture extension

Author: Gerd Isenberg

Date: 01:57:59 03/20/05

Go up one level in this thread


On March 19, 2005 at 15:12:27, Daniel Mehrmannn wrote:

>On March 19, 2005 at 09:51:46, Gerd Isenberg wrote:
>
>>On March 19, 2005 at 04:41:26, Thomas Gaksch wrote:
>>
>>>On March 19, 2005 at 04:22:58, Thomas Gaksch wrote:
>>>
>>>>sorry, i never used the makefile or an other compiler than vc. i always build
>>>>the code with microsoft visual c++ 6.0. with visual c++ 6.0 it works with only
>>>>including #include <math.h>
>>>>but math.h is only needed for the abs function. you also could write a macro for
>>>>the abs function.
>>>>thomas
>>>
>>>sorry, i forgot the example for an abs macro:
>>>#define ABS(x) ((x)<0?-(x):(x))
>>>
>>>bye
>>>thomas
>>
>>Hi Thomas,
>>
>>with good predictable data pattern the macro, producing a conditional jump is
>>fine. Trying to outperform the compiler with arithmetic shift right to force
>>branchless code is often worse, ...
>>
>>long abs(long x) {
>> long y;
>> y = x >> 31;
>> return (x ^ y) - y;
>>}
>>
>>... until compiler is smart enough to produce this x86 code which seems shortest
>>and fastest abs() at least for random data. And iirr this is the code of msc6
>>abs-intrinsic:
>>
>> 99    cdq
>> 33 C2 xor eax,edx
>> 2B C2 sub eax,edx
>>
>>See http://chessprogramming.org/cccsearch/ccc.php?art_id=304882
>>or use CCC-Search-Engine with Subject "Implementation of the abs() function" to
>>find further details.
>>
>>Cheers,
>>Gerd
>
>Hi Gerd :)
>
>I tryed the same and my implemtation was allso slower as the MS Vc 6.0 include
>:(
>
>Daniel


Hi Daniel,

yes the msc abs intrinsic with including math.h is 5 byte branchless and seems
fastest. A minor drawback, if register pressure is high, might be the fixed
register usage due to cdq-instruction which sign extends eax to edx:eax.
So edx contains either 0 if eax >= 0 or -1 (0xffffffff) if eax < 0.
The trailing xor and sub do nothing if eax >= 0. Otherwise xor -1 builds the
ones-complement while -(-1) or +1 results in the final twos-complement:
-x = ~x+1

Under circumstances of high register pressure and good predictable conditions,
using only one register and conditional jump might be faster.

cmp  ecx, 0     ; or any other register
jp   absready
neg  ecx
absready:

Propably such minor tricks are subject of profile guided optimization with
future (or allready current?) compilers.

But imho this explicit abs-code should also produce optimal code:

inline long abs(long x) {
 long y;
 y = x >> (sizeof(long)-1);
 return (x ^ y) - y;
}

with
cdq instead of
mov edx, eax
sar edx, 31

Gerd



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.