Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: advantages versus disadvantage P4

Author: Robert Hyatt
Date: 10:52:23 12/14/02
On December 14, 2002 at 01:38:31, Eugene Nalimov wrote:

>The core principle of HT is "we can add 5% more of silicon and get back 10-30%
>speedup in lot of cases" -- at least according to Intel architects.

That has been what they have been saying/writing.  Duplicating stuff doesn't fit
into that
idea nearly as well as just increasing the size of a bottleneck when it becomes
a problem.


>
>The moment you start adding extra caches, execution units, etc. for HT you can
>just go easy way and do CMP -- 2 separate CPU cores on one die. Of course it
>would be faster than HT, but chip would be much more than 5% bigger.
>
>Thanks,
>Eugene
>
>On December 14, 2002 at 01:09:12, Matt Taylor wrote:
>
>>On December 13, 2002 at 23:05:39, Robert Hyatt wrote:
>>
>>>On December 13, 2002 at 21:16:55, Matt Taylor wrote:
>>>
>>>>On December 12, 2002 at 10:15:16, Vincent Diepeveen wrote:
>>>>
>>>>>On December 11, 2002 at 02:34:33, Matt Taylor wrote:
>>>>>
>>>>>[snip]
>>>>>>Eugene's explanation fits, though. I am suprised that Intel did not duplicate
>>>>>>the trace cache for both logical CPUs. It's like trying to fit an even bigger
>>>>>>peg into an already too small hole...
>>>>>>-Matt
>>>>>
>>>>>Exactly, but the hardware reason to do that is very simply.
>>>>>
>>>>>They can clock the thing to 3.04Ghz now. 2.8Ghz for the Xeon.
>>>>>
>>>>>But if you double the L1 data cache size or the trace cache size
>>>>>(i will not do a statement what in my eyes is smarter to duplicate
>>>>>because you can see my next sentence why) then you have a major other
>>>>>problem.
>>>>>
>>>>>You won't be able to clock it to 3.04Ghz then nor 2.8Ghz for the Xeon.
>>>>>
>>>>>If you have something small, you can clock it high.
>>>>>
>>>>>If you have something big like an Itanium2 or the 128KB L1 cache
>>>>>of a K7 then you can't clock it that easily to 3.04Ghz.
>>>>>
>>>>>So the clocking and the size of such important integrated things into
>>>>>the procesor is very closely related.
>>>>
>>>>Actually that's not true. There are some 42 million transistors on the P4
>>>>Northwood -- more than on Athlon or any IA-32 processor prior to it. Yet it
>>>>clocks up higher than any competing chips have. The trick isn't to make things
>>>>simple; it's to split them up.
>>>>
>>>>Two independant trace caches would scale fine without adding significant cost to
>>>>the processor. However, it would impede Intel profit margins because it would
>>>>require a bit of redesign.
>>>>
>>>>-Matt
>>>
>>>
>>>I don't think anyone would want two separate trace caches, as that would violate
>>>the very principle of hyper-threading.  Rather, a larger trace cache, with a
>>>wider path out so that 2x the micro-ops can be spit out at once would be the
>>>hyper-thread design approach keeping with the spirit of SMT overall.
>>
>>The core principle of HT is that you have 2 threads using one set of execution
>>units, not necessarily all of the chip's facilities.
>>
>>Doubling the size and width of the u-op cache would work, too, but I think it
>>would be more difficult to build than a mux of two caches.
>>
>>-Mat
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.