Author: Gerd Isenberg
Date: 01:57:59 03/20/05
Go up one level in this thread
On March 19, 2005 at 15:12:27, Daniel Mehrmannn wrote: >On March 19, 2005 at 09:51:46, Gerd Isenberg wrote: > >>On March 19, 2005 at 04:41:26, Thomas Gaksch wrote: >> >>>On March 19, 2005 at 04:22:58, Thomas Gaksch wrote: >>> >>>>sorry, i never used the makefile or an other compiler than vc. i always build >>>>the code with microsoft visual c++ 6.0. with visual c++ 6.0 it works with only >>>>including #include <math.h> >>>>but math.h is only needed for the abs function. you also could write a macro for >>>>the abs function. >>>>thomas >>> >>>sorry, i forgot the example for an abs macro: >>>#define ABS(x) ((x)<0?-(x):(x)) >>> >>>bye >>>thomas >> >>Hi Thomas, >> >>with good predictable data pattern the macro, producing a conditional jump is >>fine. Trying to outperform the compiler with arithmetic shift right to force >>branchless code is often worse, ... >> >>long abs(long x) { >> long y; >> y = x >> 31; >> return (x ^ y) - y; >>} >> >>... until compiler is smart enough to produce this x86 code which seems shortest >>and fastest abs() at least for random data. And iirr this is the code of msc6 >>abs-intrinsic: >> >> 99 cdq >> 33 C2 xor eax,edx >> 2B C2 sub eax,edx >> >>See http://chessprogramming.org/cccsearch/ccc.php?art_id=304882 >>or use CCC-Search-Engine with Subject "Implementation of the abs() function" to >>find further details. >> >>Cheers, >>Gerd > >Hi Gerd :) > >I tryed the same and my implemtation was allso slower as the MS Vc 6.0 include >:( > >Daniel Hi Daniel, yes the msc abs intrinsic with including math.h is 5 byte branchless and seems fastest. A minor drawback, if register pressure is high, might be the fixed register usage due to cdq-instruction which sign extends eax to edx:eax. So edx contains either 0 if eax >= 0 or -1 (0xffffffff) if eax < 0. The trailing xor and sub do nothing if eax >= 0. Otherwise xor -1 builds the ones-complement while -(-1) or +1 results in the final twos-complement: -x = ~x+1 Under circumstances of high register pressure and good predictable conditions, using only one register and conditional jump might be faster. cmp ecx, 0 ; or any other register jp absready neg ecx absready: Propably such minor tricks are subject of profile guided optimization with future (or allready current?) compilers. But imho this explicit abs-code should also produce optimal code: inline long abs(long x) { long y; y = x >> (sizeof(long)-1); return (x ^ y) - y; } with cdq instead of mov edx, eax sar edx, 31 Gerd
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.