Author: Jeremiah Penery
Date: 21:15:59 05/13/02
Go up one level in this thread
On May 13, 2002 at 09:58:13, Vincent Diepeveen wrote: >On May 12, 2002 at 13:47:00, Jeremiah Penery wrote: > >>On May 12, 2002 at 06:42:27, Vincent Diepeveen wrote: >> >In short for C programs like mine this thing might be very fast, >especially because it's in order. Can you give one possible reason why in-order execution would ever be faster than out-of-order? The general rule, AFAIK, is that out-of-order execution increases speed about 30% on normal integer code. In-order execution forces the compiler to do _all_ instruction scheduling. Scheduling for a processor as wide as Itanium must be a nightmare for most code. In addition, "The compiler adds branch hints, register stack and rotation, data and control speculation, and memory hints into EPIC instructions." (quoted from http://www.sharkyextreme.com/hardware/guides/itanium/3.shtml) Managing all those registers, instruction units, and all that other stuff requires a super-smart compiler. The compiler is not mature enough yet to exploit the full potential of this hardware. I would bet that IA-64 speed could increase by at least 50% by improvement in compiler technology alone. >>> not extreme penalty however for misprediction, loads of registers, and a big L1 cache. >> >>There is very little penalty for misprediction, since it has full hardware What I said was slightly wrong. The misprediction penalty is the same as for any other processor (the pipeline must be flushed and restarted, 11 cycles or something in Itanium). However, predication allows there to be fewer mispredictions. The way it works is that in an if->then/else situation, the 'then' and the 'else' are computed in parallel, and only the needed result is taken. >>predication. It also has a ton of registers, but it can only access 128(?) at a >>time, and the rest it can get through a large rotating register file, which may >>have some penalty associated with it, I don't remember specifically. >> >>As I said above, the L1 cache is actually very small. > >128 registers kicks butt! > >The L1 cache is heaven compared to the P4! > >I assume the L2 cache is better than that of the P4/P3 and at K7 >level. That takes away some pain too! McKinley has a really nice cache structure, with very low latency. Of course, Intel has always been able to make a really nice cache. >This sounds like a REAL fast processor for me!! SpecINT for 800MHz Itanium is 365 - about half of current P4/Athlon numbers. For Crafty, Itanium had a runtime of 252; compare that to the runtime for 1733MHz AthlonXP of 97.8. So the Itanium does 84% as much work on Crafty per clock cycle, and of course the McKinley should be quite a bit better. That wouldn't be too bad if they could clock Itanium at anywhere near x86 speeds. On FP, Itanium is already quite good, and McKinley will be pretty killer in that department.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.