Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: 64-bit machines

Author: Tom Kerrigan

Date: 20:31:35 02/09/03

Go up one level in this thread


On February 09, 2003 at 22:19:12, Matt Taylor wrote:

>The compiler has time to evaluate many different orderings. Furthermore, the
>compiler has more flexibility in reordering; the processor can only reorder

It can do _static_ reordering, not dynamic.

>>If static scheduling is better than dynamic, why does McKinley deliver fewer
>>SPECint/GHz than the similarly clocked 21364, SPARC64, and PA-RISC 8700 chips?
>Actually, last I checked Sparc-64 scored a tad lower than IA-32 at the same
>clock speed. I was looking just yesterday and noted that Sparc scores were
>rather low.

Actually, last I checked was 1 second ago:
http://www.aceshardware.com/SPECmine/index.jsp?b=0&s=2&v=1&if=0&r1f=2&r2f=0&m1f=0&m2f=0&o=0&o=1

>You are right that HT is pointless on VLIW chips. How is this a weakness? It
>means the chip is already efficient enough that HT would not help it. That is
>the point of VLIW computing! You don't need things like HT because your machine
>is -already- efficient. Conversely IA-32 is weak in the area of efficiency.

Nonsense. How many times are those IA-64 instruction bundles padded with NOPs
because there isn't enough ILP to fill them or fit the pairing restrictions?
Each one of those NOPs means an idle execution unit that could be devoted to
processing another thread. Also, IA-64 suffers from memory latency just as much
as the next chip. (Well, moreso, because it's in-order.) All that idle time
waiting on memory could be spent processing another thread that's probably in
cache.

>IA-64 can do up to 6 instructions per cycle; the best IA-32 offers is 3. Again,

Which doesn't matter. The Pentium III retires about 1.2 instructions per clock
and McKinley doesn't do much better.

>>That has nothing to do with the high latency of the register file caused by
>>having so damn many registers.
>Is this an assumption, or do you have proof? It would be an awkward machine
>indeed if general register accesses weren't 1-cycle.

What is your explanation of McKinley's low clock speed? The SPARC's huge
register file is often blamed for its low clock speed.

>I did not mean to imply that the entire application fit in 256KB. However,

Imply? That's exactly what you said.

>You said 1M for MMX and 2M for SSE. 1+2=3. I lump media instructions together
>because both MMX and SSE are equally useless to compilers. (Of course I'm
>ignoring the Intel C exception of using scalar SSE -- not useful to chess
>programs, not very good justification of SSE either when they could have
>introduced new flat-register FP instructions.)

SSE2 is flat-register FP instructions...

-Tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.