Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: GCC annihilating VISUAL C++ ==> branchless code in 2003?

Author: Robert Hyatt

Date: 13:51:14 02/28/03

Go up one level in this thread


On February 28, 2003 at 15:53:34, Matt Taylor wrote:

>On February 28, 2003 at 13:58:48, Vincent Diepeveen wrote:
>
>>On February 28, 2003 at 11:13:03, Matt Taylor wrote:
>>
>>>On February 28, 2003 at 08:59:08, Vincent Diepeveen wrote:
>>>
>>>>On February 27, 2003 at 15:35:34, Russell Reagan wrote:
>>>>
>>><snip>
>>>>I am bad however in reading gcc generated assembly (it looks SO VERY UGLY,
>>>>similar to the new PGN format of chessbase) and it seems to me it is
>>>>possible that this code can be further optimized. I see no need to put the
>>>>board pointer in eax each time. It's using just 2 registers versus very old
>>>>MSVC is already using 3.
>>>>
>>>>Means that at the Opteron and Itanium2 and such processors with more than 8
>>>>GPRs, the GCC compiler will suck major ass of course. It doesn't even know how
>>>>to use more than 2 registers!
>>>>
>>>>But in this example it is doing things *branchless*.
>>>>
>>>>So i can't actually wait for a visual c++ edition to use CMOV* instructions
>>>>and using profile info to optimize branches.
>>>>
>>>>So in 1 small example we see both the strength of the new generations of
>>>>processors released after 1996 (pentiumpro/klamath and newer) and the
>>>>weakness of the software (visual c++ 6.0 despite pentiumpro released
>>>>in 1996 already still with service packs not using P6 instructions) and the
>>>>general inefficiency of the GNU world who isn't using "640KB should be enough
>>>>RAM", but instead still is using the lemma "2 registers will do".
>>>>
>>>>Best regards,
>>>>Vincent Diepeveen
>>>>diep@xs4all.nl
>>>
>>>Actually using fewer registers is generally regarded as more optimized. I'm sure
>>
>>less instructions within the 'invariant' (i fear it might be a dutch word of a
>>dutch professor who theoretically proved software and 'invariant' is describing
>>all instructions which are getting executed within a loop) is excellent of
>>course. Not doing the loading of the pointer within the invariant is trivially
>>faster for most loops.
>
>I hope you mean moving the invariant out of the loop is faster.
>
>>>that on architectures with billions of registers like Itanium GCC will do just
>>>fine.
>>
>>fine is a relative statement. I would say horrible. I am very sure GCC's
>>excellent achievements now for DIEP at the k7 is a temporarily victory and
>>showing very clearly AMD needs its own compiler team. If GCC's victory is not
>>limited to the K7 then the other compilers would suck ass for 64 bit processors
>>and they will perform worse than a PII at the same clockspeed would do.
>
>AMD doesn't have a budget as big as Intel's. Yes, I think it would be great if
>AMD had their own compiler team. Considering they have been losing millions of
>dollars each quarter, do you think they're very likely to start one soon?
>
>Itanium is -completely- different from x86. I have never had an Itanium on my
>desk to play with, and I don't know how GCC or Intel C perform on it. I would
>still bet that Intel C is the fastest compiler for Itanium. However, that has
>absolutely nothing to do with the K7 or any other x86 processor. That has
>everything to do with GCC's optimizer for the Itanium. Optimization for Itanium
>revolves around instruction scheduling and branch prediction.
>
>>The more registers a processor has the more problems GCC gets into, *trivially*.
>
>What are you talking about? Comparing x86 performance tells you -nothing- about
>how the compiler works for other architectures. Most compiler/architecture
>people are convinced that more registers help the optimizer generate faster
>code.

Does anything he says have to make sense?  Usually it doesn't.  And this is a
good
case on point.  GCC does quite well on a sparc, which has 32 registers (OK, 31
since register 0 always gives a value of zero when referenced.)  It produces
pretty
nice code for that machine, _from experience_, as opposed to "from wild
guesswork."

>
>One common technique for doing register allocation optimization is to allow your
>IL (intermediate language) to define an infinite (4.2 billion) number of machine
>registers. Every variable and every computation goes into a register. When the
>IL is translated into machine language for a target machine, the optimizer
>reduces the number of concurrently used registers (by storing variables in
>memory, throwing away computations, etc.) in the IL until it is equal or below
>the number that the machine supports. An optimizer employing this technique
>would work -better- on a machine with more registers.
>
>-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.