Author: Eugene Nalimov
Date: 15:36:33 05/13/02
Go up one level in this thread
Bob, P4 has 8kb L1 D-cache. Looks that Intel decided that smaller cache with 1-cycle latency gives them more than larger and slower cache. Thanks, Eugene On May 13, 2002 at 18:18:01, Robert Hyatt wrote: >On May 13, 2002 at 09:58:13, Vincent Diepeveen wrote: > >>On May 12, 2002 at 13:47:00, Jeremiah Penery wrote: >> >>>On May 12, 2002 at 06:42:27, Vincent Diepeveen wrote: >>> >>>>On May 12, 2002 at 00:31:51, Martin Andersen wrote: >>>> >>>>And it is called McKinley and on paper it's impressive >>>>what it delivers a Mhz. >>>> >>>>just a few details i remember: >>>> 1Ghz , 3MB L2, >>> >>>The cache number is wrong. Itanium (and McKinley) have only 32KB of L1 cache >>>(16KB code/16KB data). Itanium has 96K of L2 cache, McKinley has only 256K of >>>L2. The 3MB is L3 cache, which is on-chip, with 12-cycle access in McKinley (20 >>>cycles in Itanium). >> >>still is impressive, though the L1 cache bit dissappointing, >>depending upon how big a word in L1 cache is. I assume 64 bit swords. >> >>that makes L1 datacache only 2048 words or so. >> >>Still twice bigger than P4 !! > >Eh? Late P3's and P4's have had 32kb of L1 cache for a while... I am not >sure what you are talking about as a result... > > > >> >>>> 6 instructions a clock, >>> >>>Theoretically it can execute this, but hardly ever in practice (on integer >>>code). The reason is that the instructions must be bundled in groups of 6, and >>>that Itanium is an _in-order_ processor. If there aren't 6 instructions it can >>>bundle together, it has to issue a bunch of no-ops in the bundle. In addition, >>>the compiler technology for IA-64 is very immature. I'm sure with better >>>compilers they will be able to come closer to that theoretical limit. >> >>Well at end of 1996 or so they said the same about the pentium pro >> "who can use 3 instructions a clock?" >> >>But it was back then exactly 3x faster for me than a P5-133Mhz >>which could do only 2 instructions a clock. > >For totally different reasons. The P5 was a very simple 2-way superscalar >machine. THe P6 core was a very sophisticated 3-way superscalar with out of >order execution, etc. > >Hard to compare them... > > > >> >>Of course the reason why DIEP was so much faster on it, was because of >>the C compilers producing 8 bits + 32 bits code, whereas others >>who tricked around in 16 bits assembly got nailed, to use a small >>understatement. >> >>In short for C programs like mine this thing might be very fast, >>especially because it's in order. >> >>>> not extreme penalty however for misprediction, loads of registers, and a big L1 cache. >>> >>>There is very little penalty for misprediction, since it has full hardware >>>predication. It also has a ton of registers, but it can only access 128(?) at a >>>time, and the rest it can get through a large rotating register file, which may >>>have some penalty associated with it, I don't remember specifically. >>> >>>As I said above, the L1 cache is actually very small. >> >>128 registers kicks butt! >> >>The L1 cache is heaven compared to the P4! >> >>I assume the L2 cache is better than that of the P4/P3 and at K7 >>level. That takes away some pain too! >> >>This sounds like a REAL fast processor for me!! >> >>>>What do you need more? >>>> >>>>The first cpu was of course not so fast, but making it already was enough >>>>to impress the world because of the price a cpu intel can make it for. >>> >>>I'm not sure what you're talking about here. The Itanium is a very big and very >>>expensive processor.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.