Author: Frank E. Oldham
Date: 23:08:46 12/21/05
Go up one level in this thread
On December 21, 2005 at 17:09:02, Andreas Guettinger wrote: >On December 21, 2005 at 16:18:19, Tord Romstad wrote: > >>On December 21, 2005 at 15:30:07, Andreas Guettinger wrote: >> >>>Well, the problem is that the -DINLINE_PPC flag does nothing on your sources. :) >>>I realized that a big part of the speedup seems to come from the asm >>>bitoperating routines. >> >>I see. Thanks for the code and explanations. >> >>Is there any easy way to achieve similar speeds with plain, portable C >>code? If resorting to inline assembly language is really necessary to >>produce a fast 64 bit binary, I would be very disappointed. I detest >>assembly language. >> >>Tord > >Well, it's actually only two (2) lines of assembly code (one command) if you >look at it. >I'm quite suprised that the speedup of the assmbly command for cntlzd (count >leading zeros) is that high. It works with 64bit, though. The euqivalent for >32bit would be cntlzw. > >from ppc_intrinisc.h: > >/* > * __cntlzw - Count Leading Zeros Word > * __cntlzd - Count Leading Zeros Double Word > */ > >#define __cntlzw(a) __builtin_clz(a) >#define __cntlzd(a) __builtin_clzll(a) > > >I agree in sense of portability asm code is bad, I don't like it too. Crafty >falls back to plain C code for FirstOne(), LastOne() if no inline_asm or >inline_ppc is used. For the G4 I found the speedup to use cntlzw instead >neglectable (maybe 5%). Surprisingly this seems to be different for 64bit. >Crafty seems to rely heavily on these bitcounting function, but that may be >different for other engines and they may not profit a lot from the asm code. >Additionally the C lookup table approach of crafty that replaces the asm >functions might be not the fastest for ppc. There was a thread some time ago >where different algorithms where compared (magic bitscan, table, gerd, eugene, >etc. ) I still have the source for them lying around somwhere. > >regards >Andy It's more the fact that gcc3.3 generates such bad code calling the non-inline versions. BTW, gcc4.0.1 and OSX 10.4.3 give a slight speedup in crafty 19.9, perhaps because -fast becomes usable. If you can't use it for your sources, then at least put in some of the alignment options -- they can make a big difference. Also, you should become familiar with Shark -- Apple's free profiling tool, which can show you problems in your code. Shark is part of the CHUD package and there's a pretty good tutorial for it on the main developer pages. Frank
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.