Author: Andreas Guettinger
Date: 04:14:27 12/22/05
Go up one level in this thread
On December 22, 2005 at 02:08:46, Frank E. Oldham wrote: >On December 21, 2005 at 17:09:02, Andreas Guettinger wrote: > >>On December 21, 2005 at 16:18:19, Tord Romstad wrote: >> >>>On December 21, 2005 at 15:30:07, Andreas Guettinger wrote: >>> >>>>Well, the problem is that the -DINLINE_PPC flag does nothing on your sources. :) >>>>I realized that a big part of the speedup seems to come from the asm >>>>bitoperating routines. >>> >>>I see. Thanks for the code and explanations. >>> >>>Is there any easy way to achieve similar speeds with plain, portable C >>>code? If resorting to inline assembly language is really necessary to >>>produce a fast 64 bit binary, I would be very disappointed. I detest >>>assembly language. >>> >>>Tord >> >>Well, it's actually only two (2) lines of assembly code (one command) if you >>look at it. >>I'm quite suprised that the speedup of the assmbly command for cntlzd (count >>leading zeros) is that high. It works with 64bit, though. The euqivalent for >>32bit would be cntlzw. >> >>from ppc_intrinisc.h: >> >>/* >> * __cntlzw - Count Leading Zeros Word >> * __cntlzd - Count Leading Zeros Double Word >> */ >> >>#define __cntlzw(a) __builtin_clz(a) >>#define __cntlzd(a) __builtin_clzll(a) >> >> >>I agree in sense of portability asm code is bad, I don't like it too. Crafty >>falls back to plain C code for FirstOne(), LastOne() if no inline_asm or >>inline_ppc is used. For the G4 I found the speedup to use cntlzw instead >>neglectable (maybe 5%). Surprisingly this seems to be different for 64bit. >>Crafty seems to rely heavily on these bitcounting function, but that may be >>different for other engines and they may not profit a lot from the asm code. >>Additionally the C lookup table approach of crafty that replaces the asm >>functions might be not the fastest for ppc. There was a thread some time ago >>where different algorithms where compared (magic bitscan, table, gerd, eugene, >>etc. ) I still have the source for them lying around somwhere. >> >>regards >>Andy > >It's more the fact that gcc3.3 generates such bad code calling the non-inline >versions. >BTW, gcc4.0.1 and OSX 10.4.3 give a slight speedup in crafty 19.9, perhaps >because -fast becomes >usable. If you can't use it for your sources, then at least put in some of the >alignment options -- they can make a big difference. Also, you should become >familiar with Shark -- Apple's free profiling tool, >which can show you problems in your code. Shark is part of the CHUD package and >there's a pretty good tutorial for it on the main developer pages. > >Frank I used gcc version 4.0.1 (Apple Computer, Inc. build 5247) for the speedup comparisions. regards Andy
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.