Author: Vincent Diepeveen
Date: 05:53:47 04/15/03
Go up one level in this thread
On April 14, 2003 at 18:24:10, Tom Kerrigan wrote: >On April 14, 2003 at 16:50:03, Vincent Diepeveen wrote: > >>On April 14, 2003 at 16:35:04, Tom Kerrigan wrote: >> >>>On April 14, 2003 at 16:06:57, Vincent Diepeveen wrote: >>> >>>>On April 14, 2003 at 15:25:15, Tom Kerrigan wrote: >>>> >>>>>On April 13, 2003 at 22:58:48, Jeremiah Penery wrote: >>>>> >>>>>>>I bet intel will call P4-Prescott to be SMT too instead of CMP. But do you >>>>>>>really believe it's SMT? >>>>>> >>>>>>Um, yes. >>>>> >>>>>Heh. Absolutely. >>>>> >>>>>Look at the pictures of the die. >>>>> >>>>>Do you see 2 CPUs? >>>> >>>>>I don't. >>>> >>>>Look again. 2 rapid execution engines (cpu's) with each their own 16KB L1 cache: >>>> >>>>http://www.chip-architect.com/news/2003_03_06_Looking_at_Intels_Prescott.html >>> >>>Ha, you think rapid execution engines are CPUs? >>> >>>Then what is all that other stuff on the chip, besides the rapid execution units >>>and the L2 cache? >>>Filler? >> >>Useless crap i hope or next prescott will dick us again in performance. > >Are you kidding me? Are you suggesting that > 50% of the Pentium 4/Prescott is >"useless crap"? Yeah, that's real likely. > >The "rapid execution engine" is basically just the CPU's ALUs. No instruction >cache, no instruction scheduler, no control logic, no memory logic, no FPU/SIMD >units, in other words, it's a small fraction of a CPU, not a CPU itself. > >There have been several theories about the 2nd rapid execution engine. I favor >the theory that it's for redundancy, to improve yields. Intel will test both >units after the chip is made and disable the one that's slower. > >>P4 already is dead slow for its price and knowing that SMT hardly can get used >>as it improves nps too little. > >Prescott will double Northwood's out of order resources, and all of Prescott's >caches are bigger, so it's likely that Prescott will have much better HT >performance. > >>Seeing Trace cache is so big it is understandable they just put 1 copy on the >>chip of it, but i really regret it for DIEP. >> >>Decoding 1 instruction a clock sucks ass. Even my sister can do that faster ;) >> >>Any notion that is improved at prescott? > >Intel wouldn't have designed the instruction decoder the way it did if it's a >major performance bottleneck. Intel isn't full of a bunch of idiots who design >processors by trial and error. Do you even know how often Diep misses the L1 >icache? It's not uncommon to get 99.9% instruction cache hit rates, so you could >have the slowest decoder in the world (maybe the P4 does) and it wouldn't matter >for the vast majority of the time. > >>Anyway that L1 cache really is nice to have for both chips. 4 instructions a >>clock is a big improvement and the L1 is improved to 16KB and the tracecach to >>16k. So that's at least some progress at important points for DIEP. > >Prescott is _one_ chip. It has _one_ CPU. It's not clear what the second set of >ALUs is for or even if more than one set will be enabled, so you should stop >jumping to conclusions. > >-Tom Implementation is very important but if you design 128 registers at one spot and reserve 64 of them for logical cpu 1 and 64 for logical cpu 2 then you can see that also as 2 cpu's which each have their own 64 registers. So it is a simple matter of how many resources get shared, to jump to the conclusion what the SMT speedup will be. i will measure for fun the icache hit at the P4 if they have a tool to do so at a running processor. if not then i'll have a look at whether perfmon can do the job.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.