Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: how i see SMT

Author: Vincent Diepeveen

Date: 05:53:47 04/15/03

Go up one level in this thread


On April 14, 2003 at 18:24:10, Tom Kerrigan wrote:

>On April 14, 2003 at 16:50:03, Vincent Diepeveen wrote:
>
>>On April 14, 2003 at 16:35:04, Tom Kerrigan wrote:
>>
>>>On April 14, 2003 at 16:06:57, Vincent Diepeveen wrote:
>>>
>>>>On April 14, 2003 at 15:25:15, Tom Kerrigan wrote:
>>>>
>>>>>On April 13, 2003 at 22:58:48, Jeremiah Penery wrote:
>>>>>
>>>>>>>I bet intel will call P4-Prescott to be SMT too instead of CMP. But do you
>>>>>>>really believe it's SMT?
>>>>>>
>>>>>>Um, yes.
>>>>>
>>>>>Heh. Absolutely.
>>>>>
>>>>>Look at the pictures of the die.
>>>>>
>>>>>Do you see 2 CPUs?
>>>>
>>>>>I don't.
>>>>
>>>>Look again. 2 rapid execution engines (cpu's) with each their own 16KB L1 cache:
>>>>
>>>>http://www.chip-architect.com/news/2003_03_06_Looking_at_Intels_Prescott.html
>>>
>>>Ha, you think rapid execution engines are CPUs?
>>>
>>>Then what is all that other stuff on the chip, besides the rapid execution units
>>>and the L2 cache?
>>>Filler?
>>
>>Useless crap i hope or next prescott will dick us again in performance.
>
>Are you kidding me? Are you suggesting that > 50% of the Pentium 4/Prescott is
>"useless crap"? Yeah, that's real likely.
>
>The "rapid execution engine" is basically just the CPU's ALUs. No instruction
>cache, no instruction scheduler, no control logic, no memory logic, no FPU/SIMD
>units, in other words, it's a small fraction of a CPU, not a CPU itself.
>
>There have been several theories about the 2nd rapid execution engine. I favor
>the theory that it's for redundancy, to improve yields. Intel will test both
>units after the chip is made and disable the one that's slower.
>
>>P4 already is dead slow for its price and knowing that SMT hardly can get used
>>as it improves nps too little.
>
>Prescott will double Northwood's out of order resources, and all of Prescott's
>caches are bigger, so it's likely that Prescott will have much better HT
>performance.
>
>>Seeing Trace cache is so big it is understandable they just put 1 copy on the
>>chip of it, but i really regret it for DIEP.
>>
>>Decoding 1 instruction a clock sucks ass. Even my sister can do that faster ;)
>>
>>Any notion that is improved at prescott?
>
>Intel wouldn't have designed the instruction decoder the way it did if it's a
>major performance bottleneck. Intel isn't full of a bunch of idiots who design
>processors by trial and error. Do you even know how often Diep misses the L1
>icache? It's not uncommon to get 99.9% instruction cache hit rates, so you could
>have the slowest decoder in the world (maybe the P4 does) and it wouldn't matter
>for the vast majority of the time.
>
>>Anyway that L1 cache really is nice to have for both chips. 4 instructions a
>>clock is a big improvement and the L1 is improved to 16KB and the tracecach to
>>16k. So that's at least some progress at important points for DIEP.
>
>Prescott is _one_ chip. It has _one_ CPU. It's not clear what the second set of
>ALUs is for or even if more than one set will be enabled, so you should stop
>jumping to conclusions.
>
>-Tom

Implementation is very important but if you design 128 registers at one spot and
reserve 64 of them for logical cpu 1 and 64 for logical cpu 2 then you can see
that also as 2 cpu's which each have their own 64 registers.

So it is a simple matter of how many resources get shared, to jump to the
conclusion what the SMT speedup will be.

i will measure for fun the icache hit at the P4 if they have a tool to do so at
a running processor. if not then i'll have a look at whether perfmon can do the
job.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.