Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Question for the Crafty/Compiler experts

Author: Dieter Buerssner

Date: 12:49:14 02/19/04

Go up one level in this thread


On February 19, 2004 at 11:14:30, Robert Hyatt wrote:

>Where did that come from?

I downloaded yesterday the tarball of 19.10. It was the newest then. Just
downloaded 19.11. Here the C-code seems faster again (with icc and gcc. THis
time I used make profile for icc).

There seems to be a similar issue for PopCnt, than the one I mentioned
previously:

int static __inline__ PopCnt(BITBOARD word)
{
/*  r0=result, %1=tmp, %2=first input, %3=second input */
  long      dummy, dummy2;

asm("        xorl    %0, %0"                    "\n\t"
    "        testl   %2, %2"                    "\n\t"
    "        jz      2f"                        "\n\t"
    "1:      leal    -1(%2), %1"                "\n\t"
    "        incl    %0"                        "\n\t"
    "        andl    %1, %2"                    "\n\t"
                         ^^^
    "        jnz     1b"                        "\n\t"
    "2:      testl   %3, %3"                    "\n\t"
    "        jz      4f"                        "\n\t"
    "3:      leal    -1(%3), %1"                "\n\t"
    "        incl    %0"                        "\n\t"
    "        andl    %1, %3"                    "\n\t"
                         ^^^
    "        jnz     3b"                        "\n\t"
    "4:"                                        "\n\t"
  : "=&q" (dummy), "=&q" (dummy2)
  : "q" ((int) (word>>32)), "q" ((int) word)
  : "cc");
  return (dummy);
}

At the indicated points, you change the input registers. The compiler will not
note this (I think). At least it seems against what I read some time ago in the
gcc manual.

This is what I have suggested in the WB forum some time ago. Possibly, you have
to change the multi line string (it worked earlier with gcc). The WB-forum
software did swallow the indentation ...

http://f11.parsimony.net/forum16635/messages/31324.htm


int static __inline__ PopCnt(BITBOARD word)
{
 int tmp, tmp2, n;
 __asm__ __volatile__(
 "movl %3, %1
 xorl %0, %0
 testl %1, %1
 je 1f
 0: incl %0
 leal -1(%1), %2
 andl %2, %1
 jne 0b
 1: movl %4, %1
 testl %1, %1
 je 3f
 2: incl %0
 leal -1(%1), %2
 andl %2, %1
 jne 2b
 3:"
 : "=r&" (n), "=r&" (tmp), "=r&" (tmp2)
 : "g" (*(unsigned long *)&a)), "g" (*(((unsigned long *)&a)+1))
 : "cc" /* Flags "condition code" changed */);
 return n;
 }

Note, that I used a bit less restrictive "registers", which should give the
compiler a bit more liberty for optimization. Any register will do for %0, %1
and %2 (not only a/b/c/dx, which would be selected by "q"). For the inputs, no
register is needed at all (for example addressing via esp is ok).

But of course, I think my suggested C-version should be preferred. If you use
cast and shift instead of the horrible casts taking the adress here, it would
even be totally portable (although not as efficient as possible on 64 bit
platforms).

Regards,
Dieter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.