Author: Robert Hyatt
Date: 13:51:14 02/28/03
Go up one level in this thread
On February 28, 2003 at 15:53:34, Matt Taylor wrote: >On February 28, 2003 at 13:58:48, Vincent Diepeveen wrote: > >>On February 28, 2003 at 11:13:03, Matt Taylor wrote: >> >>>On February 28, 2003 at 08:59:08, Vincent Diepeveen wrote: >>> >>>>On February 27, 2003 at 15:35:34, Russell Reagan wrote: >>>> >>><snip> >>>>I am bad however in reading gcc generated assembly (it looks SO VERY UGLY, >>>>similar to the new PGN format of chessbase) and it seems to me it is >>>>possible that this code can be further optimized. I see no need to put the >>>>board pointer in eax each time. It's using just 2 registers versus very old >>>>MSVC is already using 3. >>>> >>>>Means that at the Opteron and Itanium2 and such processors with more than 8 >>>>GPRs, the GCC compiler will suck major ass of course. It doesn't even know how >>>>to use more than 2 registers! >>>> >>>>But in this example it is doing things *branchless*. >>>> >>>>So i can't actually wait for a visual c++ edition to use CMOV* instructions >>>>and using profile info to optimize branches. >>>> >>>>So in 1 small example we see both the strength of the new generations of >>>>processors released after 1996 (pentiumpro/klamath and newer) and the >>>>weakness of the software (visual c++ 6.0 despite pentiumpro released >>>>in 1996 already still with service packs not using P6 instructions) and the >>>>general inefficiency of the GNU world who isn't using "640KB should be enough >>>>RAM", but instead still is using the lemma "2 registers will do". >>>> >>>>Best regards, >>>>Vincent Diepeveen >>>>diep@xs4all.nl >>> >>>Actually using fewer registers is generally regarded as more optimized. I'm sure >> >>less instructions within the 'invariant' (i fear it might be a dutch word of a >>dutch professor who theoretically proved software and 'invariant' is describing >>all instructions which are getting executed within a loop) is excellent of >>course. Not doing the loading of the pointer within the invariant is trivially >>faster for most loops. > >I hope you mean moving the invariant out of the loop is faster. > >>>that on architectures with billions of registers like Itanium GCC will do just >>>fine. >> >>fine is a relative statement. I would say horrible. I am very sure GCC's >>excellent achievements now for DIEP at the k7 is a temporarily victory and >>showing very clearly AMD needs its own compiler team. If GCC's victory is not >>limited to the K7 then the other compilers would suck ass for 64 bit processors >>and they will perform worse than a PII at the same clockspeed would do. > >AMD doesn't have a budget as big as Intel's. Yes, I think it would be great if >AMD had their own compiler team. Considering they have been losing millions of >dollars each quarter, do you think they're very likely to start one soon? > >Itanium is -completely- different from x86. I have never had an Itanium on my >desk to play with, and I don't know how GCC or Intel C perform on it. I would >still bet that Intel C is the fastest compiler for Itanium. However, that has >absolutely nothing to do with the K7 or any other x86 processor. That has >everything to do with GCC's optimizer for the Itanium. Optimization for Itanium >revolves around instruction scheduling and branch prediction. > >>The more registers a processor has the more problems GCC gets into, *trivially*. > >What are you talking about? Comparing x86 performance tells you -nothing- about >how the compiler works for other architectures. Most compiler/architecture >people are convinced that more registers help the optimizer generate faster >code. Does anything he says have to make sense? Usually it doesn't. And this is a good case on point. GCC does quite well on a sparc, which has 32 registers (OK, 31 since register 0 always gives a value of zero when referenced.) It produces pretty nice code for that machine, _from experience_, as opposed to "from wild guesswork." > >One common technique for doing register allocation optimization is to allow your >IL (intermediate language) to define an infinite (4.2 billion) number of machine >registers. Every variable and every computation goes into a register. When the >IL is translated into machine language for a target machine, the optimizer >reduces the number of concurrently used registers (by storing variables in >memory, throwing away computations, etc.) in the IL until it is equal or below >the number that the machine supports. An optimizer employing this technique >would work -better- on a machine with more registers. > >-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.