Author: Tom Kerrigan
Date: 00:21:45 02/09/03
Go up one level in this thread
On February 09, 2003 at 00:14:46, Matt Taylor wrote: >On February 07, 2003 at 08:09:23, Tom Kerrigan wrote: > >>On February 07, 2003 at 03:10:46, Matt Taylor wrote: >> >>>There is another subtle difference, too; IA-64 is heavily optimized in software >>>whereas IA-32 is heavily optimized in hardware. In IA-64 it is possible to >>>achieve rates closer to the theoretical 6 instructions per clock than it is on >>>IA-32. >> >>Possibly only because it runs at a much lower clock speed. > >Um, possibly because that is the philosophy in VLIW chip design... > >I stick a bunch of execution units (carefully picked, of course) in my CPU, just >as I would if I were building the next Pentium. The difference is that I don't >waste a lot of transistors on reordering and such to get more parallelism; I >just let the compiler optimize for my specific mix. > >IA-64 comes much closer to theoretical speed because of things like predication >and its loop counter. (Plus it uses a register stack like Sparc.) You're assuming that software scheduling does a better job than hardware scheduling but you have no data to back up that assumption. Prefetching and predication are very poor substitutes for out-of-order execution. They make writing software (or at least compilers) more difficult and they often waste valuable memory bandwidth and execution units. As for the SPARC register stack, it's widely accepted that it doesn't significantly improve performance and it makes the register file big enough to hurt clock speed (which is one of the main reasons why IA-64 chips are clocked so slow). It all but prevents register file duplication or caching, like in Alphas... >No, actually. I have never used a McKinley; I've only seen it on paper. Still, >the P4 3.06 GHz has 512K of L2 cache, and the McKinley has 3 or 6 MB. Now I >can't remember whether 6 MB is Itanium-III or McKinley. Doesn't matter for computer chess. Every program I know about (with the exception of HIARCS) has a working set of < 256k. >>>significant portions of the CPU core are dedicated to MMX/SSE and no compiler >>>can generate MMX/SSE code, but an astute assembly programmer can write code >>The Intel compiler can generate SSE2 (instead of x87) for floating point >>calculations. I believe gcc has library functions that make use of MMX. >This is not the same as saying "the compiler can vectorize code." I can Right. You said generate MMX/SSE code, not vectorize code. >MMX alone eats more than 10% of an older Athlon die -- about 4M transistors on a >42M transistor chip. 10% is pretty significant. Where did you get that number? -Tom
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.