Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64-bit machines

Author: Eugene Nalimov

Date: 20:53:32 02/09/03

Go up one level in this thread


On February 09, 2003 at 23:31:35, Tom Kerrigan wrote:

>On February 09, 2003 at 22:19:12, Matt Taylor wrote:
>
>>The compiler has time to evaluate many different orderings. Furthermore, the
>>compiler has more flexibility in reordering; the processor can only reorder
>
>It can do _static_ reordering, not dynamic.
>
>>>If static scheduling is better than dynamic, why does McKinley deliver fewer
>>>SPECint/GHz than the similarly clocked 21364, SPARC64, and PA-RISC 8700 chips?
>>Actually, last I checked Sparc-64 scored a tad lower than IA-32 at the same
>>clock speed. I was looking just yesterday and noted that Sparc scores were
>>rather low.
>
>Actually, last I checked was 1 second ago:
>http://www.aceshardware.com/SPECmine/index.jsp?b=0&s=2&v=1&if=0&r1f=2&r2f=0&m1f=0&m2f=0&o=0&o=1
>
>>You are right that HT is pointless on VLIW chips. How is this a weakness? It
>>means the chip is already efficient enough that HT would not help it. That is
>>the point of VLIW computing! You don't need things like HT because your machine
>>is -already- efficient. Conversely IA-32 is weak in the area of efficiency.
>
>Nonsense. How many times are those IA-64 instruction bundles padded with NOPs
>because there isn't enough ILP to fill them or fit the pairing restrictions?
>Each one of those NOPs means an idle execution unit that could be devoted to
>processing another thread. Also, IA-64 suffers from memory latency just as much
>as the next chip. (Well, moreso, because it's in-order.) All that idle time
>waiting on memory could be spent processing another thread that's probably in
>cache.
>
>>IA-64 can do up to 6 instructions per cycle; the best IA-32 offers is 3. Again,
>
>Which doesn't matter. The Pentium III retires about 1.2 instructions per clock
>and McKinley doesn't do much better.
>
>>>That has nothing to do with the high latency of the register file caused by
>>>having so damn many registers.
>>Is this an assumption, or do you have proof? It would be an awkward machine
>>indeed if general register accesses weren't 1-cycle.
>
>What is your explanation of McKinley's low clock speed? The SPARC's huge
>register file is often blamed for its low clock speed.

Some time ago I talked with Intel's Itanium2 architector, and he gave
*different* explanation for the relatively low clock speed. Sorry, cannot repeat
it here.

BTW, you should compare Itanium2 clock speed with clock speed of the *server*
CPUs, not desktop ones. At the moment Itanium2 was released it clock speed
looked reasonable fast. And soon faster Itaniums will be released...

Thanks,
Eugene


>>I did not mean to imply that the entire application fit in 256KB. However,
>
>Imply? That's exactly what you said.
>
>>You said 1M for MMX and 2M for SSE. 1+2=3. I lump media instructions together
>>because both MMX and SSE are equally useless to compilers. (Of course I'm
>>ignoring the Intel C exception of using scalar SSE -- not useful to chess
>>programs, not very good justification of SSE either when they could have
>>introduced new flat-register FP instructions.)
>
>SSE2 is flat-register FP instructions...
>
>-Tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.