Author: Robert Hyatt
Date: 12:58:28 02/10/03
Go up one level in this thread
On February 10, 2003 at 15:09:18, Tom Kerrigan wrote: >On February 10, 2003 at 02:41:50, Matt Taylor wrote: > >>>It can do _static_ reordering, not dynamic. >>Reordering is reordering. Optimization at compile-time has more potential than >>optimization at run-time. Run-time reordering has limited foresight. > >More potential, limited foresight, blah blah blah. No matter how many vague >notions you attribute to IA-64, you still can't explain why it's not faster >per-clock than several similarly-clocked OOO chips. Arguing with you about this >is worthless. I missed something. Where is IA 64 _not_ faster than "similarly clocked OOO chips?" IE what 900mhz pentium can keep up with it? For some programs (such as Crafty) the pentiums clocked 3x faster can't keep up. Someone reported 4ghz PIV numbers (no idea where that comes from, whether it is overclocked or what) that were roughly in line with the 900mhz I2 chip speeds for Crafty... > >>Dynamic reordering is valuable when you have a few registers so you can kind've >>sort've make use of the 40 internal registers on IA-32 chips, but IA-64 has >>many. So what? > >OOO is said to increase 21264 performance by 30%. The 21264, BTW, has 32 >registers and 40 reorder registers. I wouldn't doubt that. however, the gain is partially the result of a compiler not doing very hot in the instruction scheduling area. The sloppier it is there, the better the OOO architecture will do since it is simply stepping in for the compiler and doing what should have been done already. Comparing a _real_ good scheduling compiler vs an OOO architecture should see the compiler win every time, if the compiler is good. It makes perfect sense to let the compiler do the scheduling once, when the cost is irrelevent, rather than having the processor do it every time it fetches the same instruction stream... This was an issue on the Cray, for example, and their compilers were _very_ good at instruction scheduling as there was _no_ OOO execution whatsoever. > >>Yes. It appears I was looking at a 32-bit Sparc machine. I was reading a paper > >Have any 32 bit SPARCs been made since 1995? > Hard to classify a sparc. More like the athlon 64 bit hybrid than the IA64, IMHO. Although the performance is so bad as to make it worthless for any high-performance applications. Even the Sun O/S (solaris) has a mixture of 32 and 64 bit stuff all over the place, with most applications being 32 bit. >>It seems the SPEC scores are generally higher on chips with more cache, and the >>only McKinley score listed has a 1.5 MB L3 cache. > >I can't seem to access SPEC scores right now, but what's the point of a >super-awesome post-RISC ISA if it's just going to get beat by chips with more >cache? And if cache really is the limiting factor in McKinley's performance >here, it must be idle a significant amount of time, which reduces IPC and means >HT would be beneficial. > >>Again, I have no actual experience with an IA-64 machine because they're rather >>expensive. I can only rely on what I've read. I have never read anything about >>low IPC on IA-64. Please offer some evidence/article. > >It can still be relatively high and benefit from HT. > >>In compiler-generated code, my Athlon tends to retire closer to 2 instructions >>per clock. I would assume that McKinley does better. The restrictions really > >Which tool are you using to measure that? > >>>>ignoring the Intel C exception of using scalar SSE -- not useful to chess >>>>programs, not very good justification of SSE either when they could have >>>>introduced new flat-register FP instructions.) >>Original SSE is flat-register FP. SSE 2 allows double-precision FP computation. > >How do you make these two statements agree? > >-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.