Author: Dieter Buerssner
Date: 12:49:14 02/19/04
Go up one level in this thread
On February 19, 2004 at 11:14:30, Robert Hyatt wrote: >Where did that come from? I downloaded yesterday the tarball of 19.10. It was the newest then. Just downloaded 19.11. Here the C-code seems faster again (with icc and gcc. THis time I used make profile for icc). There seems to be a similar issue for PopCnt, than the one I mentioned previously: int static __inline__ PopCnt(BITBOARD word) { /* r0=result, %1=tmp, %2=first input, %3=second input */ long dummy, dummy2; asm(" xorl %0, %0" "\n\t" " testl %2, %2" "\n\t" " jz 2f" "\n\t" "1: leal -1(%2), %1" "\n\t" " incl %0" "\n\t" " andl %1, %2" "\n\t" ^^^ " jnz 1b" "\n\t" "2: testl %3, %3" "\n\t" " jz 4f" "\n\t" "3: leal -1(%3), %1" "\n\t" " incl %0" "\n\t" " andl %1, %3" "\n\t" ^^^ " jnz 3b" "\n\t" "4:" "\n\t" : "=&q" (dummy), "=&q" (dummy2) : "q" ((int) (word>>32)), "q" ((int) word) : "cc"); return (dummy); } At the indicated points, you change the input registers. The compiler will not note this (I think). At least it seems against what I read some time ago in the gcc manual. This is what I have suggested in the WB forum some time ago. Possibly, you have to change the multi line string (it worked earlier with gcc). The WB-forum software did swallow the indentation ... http://f11.parsimony.net/forum16635/messages/31324.htm int static __inline__ PopCnt(BITBOARD word) { int tmp, tmp2, n; __asm__ __volatile__( "movl %3, %1 xorl %0, %0 testl %1, %1 je 1f 0: incl %0 leal -1(%1), %2 andl %2, %1 jne 0b 1: movl %4, %1 testl %1, %1 je 3f 2: incl %0 leal -1(%1), %2 andl %2, %1 jne 2b 3:" : "=r&" (n), "=r&" (tmp), "=r&" (tmp2) : "g" (*(unsigned long *)&a)), "g" (*(((unsigned long *)&a)+1)) : "cc" /* Flags "condition code" changed */); return n; } Note, that I used a bit less restrictive "registers", which should give the compiler a bit more liberty for optimization. Any register will do for %0, %1 and %2 (not only a/b/c/dx, which would be selected by "q"). For the inputs, no register is needed at all (for example addressing via esp is ok). But of course, I think my suggested C-version should be preferred. If you use cast and shift instead of the horrible casts taking the adress here, it would even be totally portable (although not as efficient as possible on 64 bit platforms). Regards, Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.