Author: Frank E. Oldham
Date: 17:43:08 07/13/04
Go up one level in this thread
On July 13, 2004 at 19:56:44, Andreas Guettinger wrote:
>On July 13, 2004 at 18:40:29, Joshua Shriver wrote:
>
>>http://www.csee.wvu.edu/~jshriver/chess
>>
>>-Josh
>
>Do you use asm routines for FirstOne(), LastOne() in crafty?
>
>I wrote an inlineppc.h file which can be easily included with only a few line
>changes similar to the existing inlineamd.h or inlinex86.h header files. I can
>post the changes I made, if there's interest.
>The asm routines are triggered by the -DINLINE_PPC (-DINLINE_PPC and -DPPC64 for
>G5) in the command line.
>
>darwinG4:
> $(MAKE) target=FreeBSD \
> CC=gcc CXX=g++ \
> CFLAGS='$(CFLAGS) -Wall -pipe -O3' \
> CXFLAGS=$(CFLAGS) \
> LDFLAGS=$(LDFLAGS) \
> LIBS='-lstdc++' \
> opt='$(opt) -DFUTILITY -DINLINE_PPC -DFAST' \
> crafty-make
>
>darwinG5:
> $(MAKE) target=FreeBSD \
> CC=gcc CXX=g++ \
> CFLAGS='$(CFLAGS) -Wall -pipe -O3 -fast' \
> CXFLAGS=$(CFLAGS) \
> LDFLAGS=$(LDFLAGS) \
> LIBS='-lstdc++' \
> opt='$(opt) -DFUTILITY -DINLINE_PPC -DPPC64 -DFAST' \
> crafty-make
>
>The G4 asm routines give about 5% speedup in bench. Unfortunately I don't have a
>G5, so I could not test these. It would be interesting if somebody could test it
>on a G5. Also if somebody has improvments, let me know.
>
>regards
>Andy
>
>Here the content of inlineppc.h:
>
>/* 32bit and 64bit asm routines for powerpc,
> * include this file if defined(INLINE_PPC) inside chess.h */
>
>#include <ppc_intrinsics.h>
>
>int static __inline__ PopCnt(register BITBOARD a)
>{
> register int c = 0;
>
> while (a) {
> c++;
> a &= a - 1;
> }
> return (c);
>}
>
>#if defined(PPC64)
>
>/* code for PPC 64 bit */
>int static __inline__ FirstOne(BITBOARD arg1)
>{
> unsigned long index;
>
> __asm__("cntlzd %0, %1" : "=r" (index) : "r" (arg1));
> return index;
>}
>
>int static __inline__ LastOne(BITBOARD arg1)
>{
> unsigned long index;
>
> __asm__("cntlzd %0, %1" : "=r" (index) : "r" (arg1 ^ (arg1 - 1)));
> return index;
>}
>
>#else
>
>/* code for PPC 32 bit */
>
>/* The definition of __cntlzw is not needed when including ppc_intrinsics.h
> * if you don't want to include it, you can uncomment the following define
> *
>#define __cntlzw(a) ({ \
> unsigned long r; \
> __asm__("cntlzw %0,%1" : "=r"(r) : "r"(a)); \
> r;\
>})
>*/
>int static __inline__ FirstOne(BITBOARD arg1)
>{
> unsigned long i;
>
> if ((i = arg1 >> 32))
> return(__cntlzw(i));
> if ((i = (unsigned int) arg1))
> return(__cntlzw(i) + 32);
> return(64);
>}
>
>int static __inline__ LastOne(BITBOARD arg1)
>{
> unsigned long i;
>
> if ((i = (unsigned int) arg1))
> return(__cntlzw(i ^ (i - 1)) + 32);
> if ((i = arg1 >> 32))
> return(__cntlzw(i ^ (i - 1)));
> return(64);
>}
>
>#endif
Hi ! couple of suggestions for the G5 makefile:
for smp, since almost all G5s are duals :-), add -DSMP -DMUTEX -DPOSIX -DCPUS=2
and add -lpthread to LIBS (and LDFLAGS maybe)
for gcc codegen optimizations, -fast actually is a bit slower for me than
directly specifying these:
-mcpu=G5 -mtune=G5 -mpowerpc64 -ffast-math
\
-fstrict-aliasing -fsched-interblock
\
-falign-loops=16 -falign-jumps=16 -falign-labels=16 -falign-functions=16
-malign-natural \
-fomit-frame-pointer -fasm-blocks'
\
the alignment options are surprisingly effective (around 5% improvement at one
point) -- the latest version of Shark suggests -falign-loops=32 at some points,
but this slowed it down a bit when applied to the whole crafty.c compilation --
perhaps if I split it up into pieces and used different options ...
Frank
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.