Author: Matt Taylor
Date: 21:14:46 02/08/03
Go up one level in this thread
On February 07, 2003 at 08:09:23, Tom Kerrigan wrote: >On February 07, 2003 at 03:10:46, Matt Taylor wrote: > >>There is another subtle difference, too; IA-64 is heavily optimized in software >>whereas IA-32 is heavily optimized in hardware. In IA-64 it is possible to >>achieve rates closer to the theoretical 6 instructions per clock than it is on >>IA-32. > >Possibly only because it runs at a much lower clock speed. Um, possibly because that is the philosophy in VLIW chip design... I stick a bunch of execution units (carefully picked, of course) in my CPU, just as I would if I were building the next Pentium. The difference is that I don't waste a lot of transistors on reordering and such to get more parallelism; I just let the compiler optimize for my specific mix. IA-64 comes much closer to theoretical speed because of things like predication and its loop counter. (Plus it uses a register stack like Sparc.) >>The IA-64 is probably extremely nice to compute with (6 MB L2 cache!!) if you > >Sort of. A 3GHz P4 outscores a 900MHz McKinely by 67% at SPECint2k, which is >what's important for computer chess. McKinley is good at SPECfp2k. Maybe that's >what you're referring to. No, actually. I have never used a McKinley; I've only seen it on paper. Still, the P4 3.06 GHz has 512K of L2 cache, and the McKinley has 3 or 6 MB. Now I can't remember whether 6 MB is Itanium-III or McKinley. I have my doubts regarding such numbers; others in this thread have already expressed empirical data showing Itanium is faster than the P4 in chess. >>Athlon64 will support all of these instructions. Yes, it is a waste when >>significant portions of the CPU core are dedicated to MMX/SSE and no compiler >>can generate MMX/SSE code, but an astute assembly programmer can write code for > >The Intel compiler can generate SSE2 (instead of x87) for floating point >calculations. I believe gcc has library functions that make use of MMX. This is not the same as saying "the compiler can vectorize code." I can hand-tweak my routines all day and then show you how I've made terrific use of MMX and SSE. I have to use intrinsics or inline assembly in order to do so. The compiler will -not- generate MMX/SSE instructions to vectorize my code. It will not do 64-bit computations in MMX registers. If the Intel compiler will use scalar SSE, good for them, but none of the other major compilers generate -any- MMX or SSE instruction for C code. I would be suprised if any C compiler does; Intel doesn't go further than scalar FP code. Using SSE to do scalar floating-point calculations isn't a real big thing, either. It is marginally faster (eliminates some overhead), but how does that benefit chess? >I wouldn't say MMX or SSE uses significant portions of the CPU core, relatively >speaking. The difference between a Pentium and a Pentium MMX is ~1M transistors, >and probably most of those were devoted to doubling the L1 cache sizes, not to >MMX functionality. The difference between the Pentium 2 and the Pentium 3 (with >SSE) is ~2M transistors. I guess you can decide for yourself if these numbers >are significant. MMX alone eats more than 10% of an older Athlon die -- about 4M transistors on a 42M transistor chip. 10% is pretty significant. -Matt
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.