Author: Eugene Nalimov
Date: 20:53:32 02/09/03
Go up one level in this thread
On February 09, 2003 at 23:31:35, Tom Kerrigan wrote: >On February 09, 2003 at 22:19:12, Matt Taylor wrote: > >>The compiler has time to evaluate many different orderings. Furthermore, the >>compiler has more flexibility in reordering; the processor can only reorder > >It can do _static_ reordering, not dynamic. > >>>If static scheduling is better than dynamic, why does McKinley deliver fewer >>>SPECint/GHz than the similarly clocked 21364, SPARC64, and PA-RISC 8700 chips? >>Actually, last I checked Sparc-64 scored a tad lower than IA-32 at the same >>clock speed. I was looking just yesterday and noted that Sparc scores were >>rather low. > >Actually, last I checked was 1 second ago: >http://www.aceshardware.com/SPECmine/index.jsp?b=0&s=2&v=1&if=0&r1f=2&r2f=0&m1f=0&m2f=0&o=0&o=1 > >>You are right that HT is pointless on VLIW chips. How is this a weakness? It >>means the chip is already efficient enough that HT would not help it. That is >>the point of VLIW computing! You don't need things like HT because your machine >>is -already- efficient. Conversely IA-32 is weak in the area of efficiency. > >Nonsense. How many times are those IA-64 instruction bundles padded with NOPs >because there isn't enough ILP to fill them or fit the pairing restrictions? >Each one of those NOPs means an idle execution unit that could be devoted to >processing another thread. Also, IA-64 suffers from memory latency just as much >as the next chip. (Well, moreso, because it's in-order.) All that idle time >waiting on memory could be spent processing another thread that's probably in >cache. > >>IA-64 can do up to 6 instructions per cycle; the best IA-32 offers is 3. Again, > >Which doesn't matter. The Pentium III retires about 1.2 instructions per clock >and McKinley doesn't do much better. > >>>That has nothing to do with the high latency of the register file caused by >>>having so damn many registers. >>Is this an assumption, or do you have proof? It would be an awkward machine >>indeed if general register accesses weren't 1-cycle. > >What is your explanation of McKinley's low clock speed? The SPARC's huge >register file is often blamed for its low clock speed. Some time ago I talked with Intel's Itanium2 architector, and he gave *different* explanation for the relatively low clock speed. Sorry, cannot repeat it here. BTW, you should compare Itanium2 clock speed with clock speed of the *server* CPUs, not desktop ones. At the moment Itanium2 was released it clock speed looked reasonable fast. And soon faster Itaniums will be released... Thanks, Eugene >>I did not mean to imply that the entire application fit in 256KB. However, > >Imply? That's exactly what you said. > >>You said 1M for MMX and 2M for SSE. 1+2=3. I lump media instructions together >>because both MMX and SSE are equally useless to compilers. (Of course I'm >>ignoring the Intel C exception of using scalar SSE -- not useful to chess >>programs, not very good justification of SSE either when they could have >>introduced new flat-register FP instructions.) > >SSE2 is flat-register FP instructions... > >-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.