Author: Frank Phillips
Date: 11:26:45 02/19/04
Go up one level in this thread
On February 19, 2004 at 13:24:00, Robert Hyatt wrote: >On February 19, 2004 at 12:15:18, Frank Phillips wrote: > >>On February 19, 2004 at 08:56:27, Frank Phillips wrote: >> >>>On February 19, 2004 at 08:12:32, Dieter Buerssner wrote: >>> >>>>On February 18, 2004 at 20:45:32, Robert Hyatt wrote: >>>> >>>>>How are you testing? IE when I use intel's compiler, with PGO, the inline is >>>>>faster here. Not significantly, but still faster... >>>> >>>>I used gcc without PGO (was too lazy to do the profile run). I added >>>>-DINLINE_ASM to the CFLAGS and removed the asm= for the linux target. First I >>>>had removed -DUSE_ASSEMBLY, but that didn't compile, because then the versions >>>>in boolean.c would also be compiled. So, I added the DUSE_ASSEMBLY again (and >>>>ignored the warning about static declaration follows extern declaration, which >>>>IMO does not really matter). I did not use icc, because it says: >>>> >>>># -INLINE_ASM Compiles with the Intel assembly code for FirstOne(), >>>># LastOne() and PopCnt(). This is for gcc-style inlining >>>># and thoroughly breaks the Intel C/C++ compiler at the >>>># present (version 8.0). >>>># >>>> >>>>in the Makefile. >>>> >>>>Regards, >>>>Dieter >>> >>> >>>Does it (LastOne()/FirstOne) work in gcc (version 3.3.1 Mandrake Linux 9.2 >>>3.3.1-2mdk) on an AMD XP? >>> >>>I get very different node counts, for a fixed depth search, compared to the >>>array lookup method. It does not seem to work at all on Intel 8.0 as you say. >>>Could be a bug in my program of course, but I have not found it yet. There were >>>two places I called LastBit() with an empty array, but this was harmless - and I >>>changed it. >>> >>>Frank >> >> >>The different inlinex86.h version in 19.11 works. >> >>Array look-up still faster for me on an AMD XP. >> >>Frank > > >The code has always worked with an input of zero, so that isn't a problem. I >had inadvertently left the old inlinex86.h (or inlineasm.h whatever it was >called) in the source distribution even though I never used it in a production >sense. It was just an experiment done while participating in a discussion here >on CCC a few months ago, about the apparently slowness of the cmovxx instruction >on current Intel hardware. > >It had one known bug that Dieter pointed out (one clobbered register that the >compiler might overlook although I would hope that with a dynamic register >allocation scheme, it would notice the register was getting zapped). So >anything could happen depending on the compiler. When I tried the old inline >code on intel, it just choked and puked. The new inline asm works better. > >When you say array lookup is still faster, are you talking about >firstone/lastone or popcnt or all three? bsf/bsr have always been the fastest >for me on any platform... FirstOne() and LastOne(). Although I acutally just has define that switches between array look-up and (a &= a-1, while loop), and your assembly version. This is on an AMD. I keep a note in the source (see below) to remind myself, but periodically test the assembly again. This time after I saw the discussion between Dieter and yourself. Frank // Incredibly this seem as faster // on AMD than Bob's assembly code. // "On the P2/P3/P4(?) those are like 1 cycle operations, // but they are very slow (12 cycle latency or something) // on Athlons." Jerimiah Penery (CCC, 18 Nov 2002).
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.