Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64Bit optimize coding - My experience (AMD64)

Author: Frank E. Oldham

Date: 23:08:46 12/21/05

Go up one level in this thread


On December 21, 2005 at 17:09:02, Andreas Guettinger wrote:

>On December 21, 2005 at 16:18:19, Tord Romstad wrote:
>
>>On December 21, 2005 at 15:30:07, Andreas Guettinger wrote:
>>
>>>Well, the problem is that the -DINLINE_PPC flag does nothing on your sources. :)
>>>I realized that a big part of the speedup seems to come from the asm
>>>bitoperating routines.
>>
>>I see.  Thanks for the code and explanations.
>>
>>Is there any easy way to achieve similar speeds with plain, portable C
>>code?  If resorting to inline assembly language is really necessary to
>>produce a fast 64 bit binary, I would be very disappointed.  I detest
>>assembly language.
>>
>>Tord
>
>Well, it's actually only two (2) lines of assembly code (one command) if you
>look at it.
>I'm quite suprised that the speedup of the assmbly command for cntlzd (count
>leading zeros) is that high. It works with 64bit, though. The euqivalent for
>32bit would be cntlzw.
>
>from ppc_intrinisc.h:
>
>/*
> * __cntlzw - Count Leading Zeros Word
> * __cntlzd - Count Leading Zeros Double Word
> */
>
>#define __cntlzw(a)     __builtin_clz(a)
>#define __cntlzd(a)     __builtin_clzll(a)
>
>
>I agree in sense of portability asm code is bad, I don't like it too. Crafty
>falls back to plain C code for FirstOne(), LastOne() if no inline_asm or
>inline_ppc is used. For the G4 I found the speedup to use cntlzw instead
>neglectable (maybe 5%). Surprisingly this seems to be different for 64bit.
>Crafty seems to rely heavily on these bitcounting function, but that may be
>different for other engines and they may not profit a lot from the asm code.
>Additionally the C lookup table approach of crafty that replaces the asm
>functions might be not the fastest for ppc. There was a thread some time ago
>where different algorithms where compared (magic bitscan, table, gerd, eugene,
>etc. ) I still have the source for them lying around somwhere.
>
>regards
>Andy

It's more the fact that gcc3.3 generates such bad code calling the non-inline
versions.
BTW, gcc4.0.1 and OSX 10.4.3 give a slight speedup in crafty 19.9, perhaps
because -fast becomes
usable.  If you can't use it for your sources, then at least put in some of the
alignment options -- they can make a big difference.  Also, you should become
familiar with Shark -- Apple's free profiling tool,
which can show you problems in your code.  Shark is part of the CHUD package and
there's a pretty good tutorial for it on the main developer pages.

Frank



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.