Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty 19.15 inlineppc.h

Author: Frank E. Oldham

Date: 17:43:08 07/13/04

Go up one level in this thread


On July 13, 2004 at 19:56:44, Andreas Guettinger wrote:

>On July 13, 2004 at 18:40:29, Joshua Shriver wrote:
>
>>http://www.csee.wvu.edu/~jshriver/chess
>>
>>-Josh
>
>Do you use asm routines for FirstOne(), LastOne() in crafty?
>
>I wrote an inlineppc.h file which can be easily included with only a few line
>changes similar to the existing inlineamd.h or inlinex86.h header files. I can
>post the changes I made, if there's interest.
>The asm routines are triggered by the -DINLINE_PPC (-DINLINE_PPC and -DPPC64 for
>G5) in the command line.
>
>darwinG4:
>	$(MAKE) target=FreeBSD \
>		CC=gcc CXX=g++ \
>		CFLAGS='$(CFLAGS) -Wall -pipe -O3' \
>		CXFLAGS=$(CFLAGS) \
>		LDFLAGS=$(LDFLAGS) \
>		LIBS='-lstdc++' \
>		opt='$(opt) -DFUTILITY -DINLINE_PPC -DFAST' \
>		crafty-make
>
>darwinG5:
>	$(MAKE) target=FreeBSD \
>		CC=gcc CXX=g++ \
>		CFLAGS='$(CFLAGS) -Wall -pipe -O3 -fast' \
>		CXFLAGS=$(CFLAGS) \
>		LDFLAGS=$(LDFLAGS) \
>		LIBS='-lstdc++' \
>		opt='$(opt) -DFUTILITY -DINLINE_PPC -DPPC64 -DFAST' \
>		crafty-make
>
>The G4 asm routines give about 5% speedup in bench. Unfortunately I don't have a
>G5, so I could not test these. It would be interesting if somebody could test it
>on a G5. Also if somebody has improvments, let me know.
>
>regards
>Andy
>
>Here the content of inlineppc.h:
>
>/* 32bit and 64bit asm routines for powerpc,
> * include this file if defined(INLINE_PPC) inside chess.h */
>
>#include <ppc_intrinsics.h>
>
>int static __inline__ PopCnt(register BITBOARD a)
>{
>  register int c = 0;
>
>  while (a) {
>    c++;
>    a &= a - 1;
>  }
>  return (c);
>}
>
>#if defined(PPC64)
>
>/* code for PPC 64 bit */
>int static __inline__ FirstOne(BITBOARD arg1)
>{
>	unsigned long index;
>
>	__asm__("cntlzd %0, %1" : "=r" (index) : "r" (arg1));
>	return index;
>}
>
>int static __inline__ LastOne(BITBOARD arg1)
>{
>	unsigned long index;
>
>	__asm__("cntlzd %0, %1" : "=r" (index) : "r" (arg1 ^ (arg1 - 1)));
>	return index;
>}
>
>#else
>
>/* code for PPC 32 bit */
>
>/* The definition of __cntlzw is not needed when including ppc_intrinsics.h
> * if you don't want to include it, you can uncomment the following define
> *
>#define  __cntlzw(a) ({ \
>    unsigned long r; \
>    __asm__("cntlzw %0,%1" : "=r"(r) : "r"(a)); \
>    r;\
>})
>*/
>int static __inline__ FirstOne(BITBOARD arg1)
>{
>  unsigned long i;
>
>  if ((i = arg1 >> 32))
>    return(__cntlzw(i));
>  if ((i = (unsigned int) arg1))
>    return(__cntlzw(i) + 32);
>  return(64);
>}
>
>int static __inline__ LastOne(BITBOARD arg1)
>{
>  unsigned long i;
>
>  if ((i = (unsigned int) arg1))
>    return(__cntlzw(i ^ (i - 1)) + 32);
>  if ((i = arg1 >> 32))
>    return(__cntlzw(i ^ (i - 1)));
>  return(64);
>}
>
>#endif

Hi !   couple of suggestions for the G5 makefile:

for smp, since almost all G5s are duals :-), add -DSMP -DMUTEX -DPOSIX -DCPUS=2
and add -lpthread to LIBS (and LDFLAGS maybe)

for gcc codegen optimizations, -fast actually is a bit slower for me than
directly specifying these:
-mcpu=G5  -mtune=G5 -mpowerpc64 -ffast-math
                                   \
-fstrict-aliasing -fsched-interblock
                                                     \
-falign-loops=16 -falign-jumps=16 -falign-labels=16 -falign-functions=16
-malign-natural       \
-fomit-frame-pointer -fasm-blocks'
                                                 \

the alignment options are surprisingly effective (around 5% improvement at one
point) -- the latest version of Shark suggests -falign-loops=32 at some points,
but this slowed it down a bit when applied to the whole crafty.c compilation --
perhaps if I split it up into pieces and used different options ...

Frank



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.