Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Processor's

Author: Vincent Diepeveen
Date: 15:44:24 06/17/04
On June 17, 2004 at 17:46:47, Eugene Nalimov wrote:

>I gave exact source of the numbers I posted. If you did not notice it, here it
>is once again:

>Intel Itanium 2 Processor Reference Manual For Software Development and
>Optimization, Table 6-4 "Cache Summary".

It is different from what Intel shows themselves at *OFFICIAL* meetings.

Typical you quote the bestcase sequential ones.

>You can say anything you want about cache sizes, code sizes, price, etc. (you
>like to switch the subject every time somebody points you to the "inaccuracy"),
>but you original statement "Basically opteron has fastest L2 cache which can
>deliver each 13 cycles data (4 reads simultaneously even if i understand well).
>No other processor can deliver data from L2 cache that fast" is obviously wrong.

It is not wrong. Itanium2 does not store its weakest point (instructions) in the
L2 cache but the L3 cache.

L3 cache needs 17 cycles at 1.4Ghz.

a) 1.4Ghz is slower than 2.4Ghz
b) 17 cycles is more than 13

So you are the person wrong here.

In fact it is slower : 17 / 13 * 2.4 / 1.4 = factor 2

Intel just redefined what L2 and L3 are doing. L2 is a slow clocked L1 and L3 is
the real L2 of the itanium2. I probably do not need to mention that instructions
are eating up more bytes than at x86. The interested public reading this should
realize that too, apart from that each block is 2x3 instructions.

That all at just 1.4Ghz...

>Itanium's L2 cache is faster not only when you look at the cycles count. It is
>*absolutely* faster, too. It is faster even if you'll substitute 7 cycles for L2
>latency.

At 7 cycles sequential read/write which is probably like 10 cycles when random
accessing (and never quoted in documentation from intel; for the P4 they
practically figured out it is 3 cycles in total to get data but officially in
documents they just quote 1 cycle *extra* latency for lookups which is what it
is when doing it sequential), it is practical of course taking cpu speeds into
account slower than Opteron.

But they do not store instructions inside it AFAIK.

In short you are a factor 2 off.

>Thanks,
>Eugene

>On June 17, 2004 at 14:17:34, Vincent Diepeveen wrote:
>
>>On June 17, 2004 at 13:34:33, Eugene Nalimov wrote:
>>
>>>On June 17, 2004 at 13:29:02, Anthony Cozzie wrote:
>>>
>>>>On June 17, 2004 at 13:20:40, Eugene Nalimov wrote:
>>>>
>>>>>On June 17, 2004 at 06:55:18, Vincent Diepeveen wrote:
>>>>>
>>>>>>[...]
>>>>>>
>>>>>>Please list the processors in order of L2 cache speed and you'll realize that
>>>>>>speed still is of overwhelming importance. List them at random access speed for
>>>>>>L2 cache (some processors are faster in streaming than random access in their
>>>>>>caches like P4).
>>>>>>
>>>>>>Basically opteron has fastest L2 cache which can deliver each 13 cycles data (4
>>>>>>reads simultaneously even if i understand well). No other processor can deliver
>>>>>>data from L2 cache that fast.
>>>>>
>>>>>Intel Itanium 2 Processor Reference Manual For Software Development and
>>>>>Optimization, Table 6-4 "Cache Summary":
>>>>>
>>>>>Itanium2 cache latency:
>>>>>  L1: 1 cycle, 4 loads/cycle
>>Please quote random access times to L1 cache.
>>
>>>>>  L2: 5 cycles (integer loads), 4 loads/cycle
>>Please quote random access times to L1 cache.
>>
>>Note that it is 7 cycles according to Jason Priestly (intel) when doing
>>*sequential* reads. See his seminar for dutch supercomputer Aster july 2003
>>where i was watching. www.sara.nl
>>
>>>>>  L3: 12/14 cycles, depending on cache size (integer loads), 1 load/cycle
>>14-17 cycles according to jason priestly where 14 is really the 'optimal' case.
>>But that's not *sequential* read.
>>
>>>>>
>>>>>Thanks,
>>>>>Eugene
>>>>>
>>>>
>>>>Correct me if I am wrong, but aren't Itanium's caches off by 1?  In other words,
>>>>the 6MB cache on the Itanium is L3, and the L1 cache is like 1KB?
>>>
>>itanium : L1D: 16KB 1.4Ghz (the 1.5ghz ones are like $5000)
>>Opteron : L1D: 64KB 2.4Ghz
>>
>>>L1I: 16KB ==> blocks of 6 instructions must get used!!!!!
>>Opteron : 64KB and it doesn't need to store blocks of 6 instructions
>>
>>>L2:  256KB ==> and does it store INSTRUCTIONS which is the weak spot ????
>>Opteron : 1MB also storing instructions, itanium?? 13 cycles RANDOM ACCESS
>>
>>Itanium can only execute blocks of 3 instructions and needs to execute 2 blocks
>>each clock. That with just 16 KB instruction cache.
>>
>>So the weak spot of the itanium is also at other terrains. How can you put
>>double blocks of 6 instructions in 16KB non stop?
>>
>>You can effectively divide the level caches of itanium by 3.
>>
>>Intel c++ compiler team in interview ( www.realworldtech.com ) told that their
>>big problem is keeping the instruction cache filled.
>>
>>If you can't even keep the instruction cache filled, then what do we talk about?
>>
>>So it's fast for DSP floating point, slighly faster even than opteron despite
>>opteron higher clocked, but for the same price of a dual itanium 1.5Ghz you can
>>also buy a quad opteron 2.4Ghz. So for real DSP a look like stuff you can
>>cheaper buy x times more opterons.
>>
>>You quote here DSP sequential cycle times.
>>
>>>L3:  1.5/3/6MB
>>
>>Not advantage in computerchess to have one. Of course for itanium it's crucial
>>to have one because the L1 & L2 is like non existing.
>>
>>>Thanks,
>>>Eugene
>>>
>>>>It is really amazing to me that Intel can't clock Itanium at 3+ GHZ.
>>>>
>>>>anthony
Re: Processor's Eugene Nalimov 16:13:44 06/17/04
- Re: Processor's Vincent Diepeveen 16:45:53 06/17/04
  - Re: Processor's Eugene Nalimov 17:21:34 06/17/04
    - Re: Processor's Vincent Diepeveen 03:26:47 06/18/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.