Author: Andreas Guettinger
Date: 14:09:02 12/21/05
Go up one level in this thread
On December 21, 2005 at 16:18:19, Tord Romstad wrote: >On December 21, 2005 at 15:30:07, Andreas Guettinger wrote: > >>Well, the problem is that the -DINLINE_PPC flag does nothing on your sources. :) >>I realized that a big part of the speedup seems to come from the asm >>bitoperating routines. > >I see. Thanks for the code and explanations. > >Is there any easy way to achieve similar speeds with plain, portable C >code? If resorting to inline assembly language is really necessary to >produce a fast 64 bit binary, I would be very disappointed. I detest >assembly language. > >Tord Well, it's actually only two (2) lines of assembly code (one command) if you look at it. I'm quite suprised that the speedup of the assmbly command for cntlzd (count leading zeros) is that high. It works with 64bit, though. The euqivalent for 32bit would be cntlzw. from ppc_intrinisc.h: /* * __cntlzw - Count Leading Zeros Word * __cntlzd - Count Leading Zeros Double Word */ #define __cntlzw(a) __builtin_clz(a) #define __cntlzd(a) __builtin_clzll(a) I agree in sense of portability asm code is bad, I don't like it too. Crafty falls back to plain C code for FirstOne(), LastOne() if no inline_asm or inline_ppc is used. For the G4 I found the speedup to use cntlzw instead neglectable (maybe 5%). Surprisingly this seems to be different for 64bit. Crafty seems to rely heavily on these bitcounting function, but that may be different for other engines and they may not profit a lot from the asm code. Additionally the C lookup table approach of crafty that replaces the asm functions might be not the fastest for ppc. There was a thread some time ago where different algorithms where compared (magic bitscan, table, gerd, eugene, etc. ) I still have the source for them lying around somwhere. regards Andy
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.