Author: Robert Hyatt
Date: 15:18:01 05/13/02
Go up one level in this thread
On May 13, 2002 at 09:58:13, Vincent Diepeveen wrote: >On May 12, 2002 at 13:47:00, Jeremiah Penery wrote: > >>On May 12, 2002 at 06:42:27, Vincent Diepeveen wrote: >> >>>On May 12, 2002 at 00:31:51, Martin Andersen wrote: >>> >>>And it is called McKinley and on paper it's impressive >>>what it delivers a Mhz. >>> >>>just a few details i remember: >>> 1Ghz , 3MB L2, >> >>The cache number is wrong. Itanium (and McKinley) have only 32KB of L1 cache >>(16KB code/16KB data). Itanium has 96K of L2 cache, McKinley has only 256K of >>L2. The 3MB is L3 cache, which is on-chip, with 12-cycle access in McKinley (20 >>cycles in Itanium). > >still is impressive, though the L1 cache bit dissappointing, >depending upon how big a word in L1 cache is. I assume 64 bit swords. > >that makes L1 datacache only 2048 words or so. > >Still twice bigger than P4 !! Eh? Late P3's and P4's have had 32kb of L1 cache for a while... I am not sure what you are talking about as a result... > >>> 6 instructions a clock, >> >>Theoretically it can execute this, but hardly ever in practice (on integer >>code). The reason is that the instructions must be bundled in groups of 6, and >>that Itanium is an _in-order_ processor. If there aren't 6 instructions it can >>bundle together, it has to issue a bunch of no-ops in the bundle. In addition, >>the compiler technology for IA-64 is very immature. I'm sure with better >>compilers they will be able to come closer to that theoretical limit. > >Well at end of 1996 or so they said the same about the pentium pro > "who can use 3 instructions a clock?" > >But it was back then exactly 3x faster for me than a P5-133Mhz >which could do only 2 instructions a clock. For totally different reasons. The P5 was a very simple 2-way superscalar machine. THe P6 core was a very sophisticated 3-way superscalar with out of order execution, etc. Hard to compare them... > >Of course the reason why DIEP was so much faster on it, was because of >the C compilers producing 8 bits + 32 bits code, whereas others >who tricked around in 16 bits assembly got nailed, to use a small >understatement. > >In short for C programs like mine this thing might be very fast, >especially because it's in order. > >>> not extreme penalty however for misprediction, loads of registers, and a big L1 cache. >> >>There is very little penalty for misprediction, since it has full hardware >>predication. It also has a ton of registers, but it can only access 128(?) at a >>time, and the rest it can get through a large rotating register file, which may >>have some penalty associated with it, I don't remember specifically. >> >>As I said above, the L1 cache is actually very small. > >128 registers kicks butt! > >The L1 cache is heaven compared to the P4! > >I assume the L2 cache is better than that of the P4/P3 and at K7 >level. That takes away some pain too! > >This sounds like a REAL fast processor for me!! > >>>What do you need more? >>> >>>The first cpu was of course not so fast, but making it already was enough >>>to impress the world because of the price a cpu intel can make it for. >> >>I'm not sure what you're talking about here. The Itanium is a very big and very >>expensive processor.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.