Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Where is now 64 bits Intel's computer?

Author: Eugene Nalimov

Date: 15:36:33 05/13/02

Go up one level in this thread


Bob,

P4 has 8kb L1 D-cache. Looks that Intel decided that smaller cache with 1-cycle
latency gives them more than larger and slower cache.

Thanks,
Eugene

On May 13, 2002 at 18:18:01, Robert Hyatt wrote:

>On May 13, 2002 at 09:58:13, Vincent Diepeveen wrote:
>
>>On May 12, 2002 at 13:47:00, Jeremiah Penery wrote:
>>
>>>On May 12, 2002 at 06:42:27, Vincent Diepeveen wrote:
>>>
>>>>On May 12, 2002 at 00:31:51, Martin Andersen wrote:
>>>>
>>>>And it is called McKinley and on paper it's impressive
>>>>what it delivers a Mhz.
>>>>
>>>>just a few details i remember:
>>>>  1Ghz , 3MB L2,
>>>
>>>The cache number is wrong.  Itanium (and McKinley) have only 32KB of L1 cache
>>>(16KB code/16KB data).  Itanium has 96K of L2 cache, McKinley has only 256K of
>>>L2.  The 3MB is L3 cache, which is on-chip, with 12-cycle access in McKinley (20
>>>cycles in Itanium).
>>
>>still is impressive, though the L1 cache bit dissappointing,
>>depending upon how big a word in L1 cache is. I assume 64 bit swords.
>>
>>that makes L1 datacache only 2048 words or so.
>>
>>Still twice bigger than P4 !!
>
>Eh?  Late P3's and P4's have had 32kb of L1 cache for a while...  I am not
>sure what you are talking about as a result...
>
>
>
>>
>>>> 6 instructions a clock,
>>>
>>>Theoretically it can execute this, but hardly ever in practice (on integer
>>>code).  The reason is that the instructions must be bundled in groups of 6, and
>>>that Itanium is an _in-order_ processor.  If there aren't 6 instructions it can
>>>bundle together, it has to issue a bunch of no-ops in the bundle.  In addition,
>>>the compiler technology for IA-64 is very immature.  I'm sure with better
>>>compilers they will be able to come closer to that theoretical limit.
>>
>>Well at end of 1996 or so they said the same about the pentium pro
>> "who can use 3 instructions a clock?"
>>
>>But it was back then exactly 3x faster for me than a P5-133Mhz
>>which could do only 2 instructions a clock.
>
>For totally different reasons.  The P5 was a very simple 2-way superscalar
>machine.  THe P6 core was a very sophisticated 3-way superscalar with out of
>order execution, etc.
>
>Hard to compare them...
>
>
>
>>
>>Of course the reason why DIEP was so much faster on it, was because of
>>the C compilers producing 8 bits + 32 bits code, whereas others
>>who tricked around in 16 bits assembly got nailed, to use a small
>>understatement.
>>
>>In short for C programs like mine this thing might be very fast,
>>especially because it's in order.
>>
>>>> not extreme penalty however for misprediction, loads of registers, and a big L1 cache.
>>>
>>>There is very little penalty for misprediction, since it has full hardware
>>>predication.  It also has a ton of registers, but it can only access 128(?) at a
>>>time, and the rest it can get through a large rotating register file, which may
>>>have some penalty associated with it, I don't remember specifically.
>>>
>>>As I said above, the L1 cache is actually very small.
>>
>>128 registers kicks butt!
>>
>>The L1 cache is heaven compared to the P4!
>>
>>I assume the L2 cache is better than that of the P4/P3 and at K7
>>level. That takes away some pain too!
>>
>>This sounds like a REAL fast processor for me!!
>>
>>>>What do you need more?
>>>>
>>>>The first cpu was of course not so fast, but making it already was enough
>>>>to impress the world because of the price a cpu intel can make it for.
>>>
>>>I'm not sure what you're talking about here.  The Itanium is a very big and very
>>>expensive processor.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.