Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Question for the Crafty/Compiler experts

Author: Frank Phillips

Date: 11:26:45 02/19/04

Go up one level in this thread


On February 19, 2004 at 13:24:00, Robert Hyatt wrote:

>On February 19, 2004 at 12:15:18, Frank Phillips wrote:
>
>>On February 19, 2004 at 08:56:27, Frank Phillips wrote:
>>
>>>On February 19, 2004 at 08:12:32, Dieter Buerssner wrote:
>>>
>>>>On February 18, 2004 at 20:45:32, Robert Hyatt wrote:
>>>>
>>>>>How are you testing?  IE when I use intel's compiler, with PGO, the inline is
>>>>>faster here.  Not significantly, but still faster...
>>>>
>>>>I used gcc without PGO (was too lazy to do the profile run). I added
>>>>-DINLINE_ASM to the CFLAGS and removed the asm= for the linux target. First I
>>>>had removed -DUSE_ASSEMBLY, but that didn't compile, because then the versions
>>>>in boolean.c would also be compiled. So, I added the DUSE_ASSEMBLY again (and
>>>>ignored the warning about static declaration follows extern declaration, which
>>>>IMO does not really matter). I did not use icc, because it says:
>>>>
>>>>#   -INLINE_ASM       Compiles with the Intel assembly code for FirstOne(),
>>>>#                     LastOne() and PopCnt().  This is for gcc-style inlining
>>>>#                     and thoroughly breaks the Intel C/C++ compiler at the
>>>>#                     present (version 8.0).
>>>>#
>>>>
>>>>in the Makefile.
>>>>
>>>>Regards,
>>>>Dieter
>>>
>>>
>>>Does it (LastOne()/FirstOne) work in gcc (version 3.3.1 Mandrake Linux 9.2
>>>3.3.1-2mdk) on an AMD XP?
>>>
>>>I get very different node counts, for a fixed depth search, compared to the
>>>array lookup method.  It does not seem to work at all on Intel 8.0 as you say.
>>>Could be a bug in my program of course, but I have not found it yet.  There were
>>>two places I called LastBit() with an empty array, but this was harmless - and I
>>>changed it.
>>>
>>>Frank
>>
>>
>>The different inlinex86.h version in 19.11 works.
>>
>>Array look-up still faster for me on an AMD XP.
>>
>>Frank
>
>
>The code has always worked with an input of zero, so that isn't a problem.  I
>had inadvertently left the old inlinex86.h (or inlineasm.h whatever it was
>called) in the source distribution even though I never used it in a production
>sense.  It was just an experiment done while participating in a discussion here
>on CCC a few months ago, about the apparently slowness of the cmovxx instruction
>on current Intel hardware.
>
>It had one known bug that Dieter pointed out (one clobbered register that the
>compiler might overlook although I would hope that with a dynamic register
>allocation scheme, it would notice the register was getting zapped).  So
>anything could happen depending on the compiler.  When I tried the old inline
>code on intel, it just choked and puked.  The new inline asm works better.
>
>When you say array lookup is still faster, are you talking about
>firstone/lastone or popcnt or all three?  bsf/bsr have always been the fastest
>for me on any platform...

FirstOne() and LastOne().  Although I acutally just has define that switches
between array look-up and (a &= a-1, while loop), and your assembly version.
This is on an AMD.  I keep a note in the source (see below) to remind myself,
but periodically test the assembly again.  This time after I saw the discussion
between Dieter and yourself.

Frank

// Incredibly this seem as faster
// on AMD than Bob's assembly code.
// "On the P2/P3/P4(?) those are like 1 cycle operations,
// but they are very slow (12 cycle latency or something)
// on Athlons."  Jerimiah Penery (CCC, 18 Nov 2002).



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.