Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: how i see SMT

Author: Tom Kerrigan

Date: 15:24:10 04/14/03

Go up one level in this thread


On April 14, 2003 at 16:50:03, Vincent Diepeveen wrote:

>On April 14, 2003 at 16:35:04, Tom Kerrigan wrote:
>
>>On April 14, 2003 at 16:06:57, Vincent Diepeveen wrote:
>>
>>>On April 14, 2003 at 15:25:15, Tom Kerrigan wrote:
>>>
>>>>On April 13, 2003 at 22:58:48, Jeremiah Penery wrote:
>>>>
>>>>>>I bet intel will call P4-Prescott to be SMT too instead of CMP. But do you
>>>>>>really believe it's SMT?
>>>>>
>>>>>Um, yes.
>>>>
>>>>Heh. Absolutely.
>>>>
>>>>Look at the pictures of the die.
>>>>
>>>>Do you see 2 CPUs?
>>>
>>>>I don't.
>>>
>>>Look again. 2 rapid execution engines (cpu's) with each their own 16KB L1 cache:
>>>
>>>http://www.chip-architect.com/news/2003_03_06_Looking_at_Intels_Prescott.html
>>
>>Ha, you think rapid execution engines are CPUs?
>>
>>Then what is all that other stuff on the chip, besides the rapid execution units
>>and the L2 cache?
>>Filler?
>
>Useless crap i hope or next prescott will dick us again in performance.

Are you kidding me? Are you suggesting that > 50% of the Pentium 4/Prescott is
"useless crap"? Yeah, that's real likely.

The "rapid execution engine" is basically just the CPU's ALUs. No instruction
cache, no instruction scheduler, no control logic, no memory logic, no FPU/SIMD
units, in other words, it's a small fraction of a CPU, not a CPU itself.

There have been several theories about the 2nd rapid execution engine. I favor
the theory that it's for redundancy, to improve yields. Intel will test both
units after the chip is made and disable the one that's slower.

>P4 already is dead slow for its price and knowing that SMT hardly can get used
>as it improves nps too little.

Prescott will double Northwood's out of order resources, and all of Prescott's
caches are bigger, so it's likely that Prescott will have much better HT
performance.

>Seeing Trace cache is so big it is understandable they just put 1 copy on the
>chip of it, but i really regret it for DIEP.
>
>Decoding 1 instruction a clock sucks ass. Even my sister can do that faster ;)
>
>Any notion that is improved at prescott?

Intel wouldn't have designed the instruction decoder the way it did if it's a
major performance bottleneck. Intel isn't full of a bunch of idiots who design
processors by trial and error. Do you even know how often Diep misses the L1
icache? It's not uncommon to get 99.9% instruction cache hit rates, so you could
have the slowest decoder in the world (maybe the P4 does) and it wouldn't matter
for the vast majority of the time.

>Anyway that L1 cache really is nice to have for both chips. 4 instructions a
>clock is a big improvement and the L1 is improved to 16KB and the tracecach to
>16k. So that's at least some progress at important points for DIEP.

Prescott is _one_ chip. It has _one_ CPU. It's not clear what the second set of
ALUs is for or even if more than one set will be enabled, so you should stop
jumping to conclusions.

-Tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.