Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: 64-bit machines

Author: Robert Hyatt
Date: 12:58:28 02/10/03
On February 10, 2003 at 15:09:18, Tom Kerrigan wrote:

>On February 10, 2003 at 02:41:50, Matt Taylor wrote:
>
>>>It can do _static_ reordering, not dynamic.
>>Reordering is reordering. Optimization at compile-time has more potential than
>>optimization at run-time. Run-time reordering has limited foresight.
>
>More potential, limited foresight, blah blah blah. No matter how many vague
>notions you attribute to IA-64, you still can't explain why it's not faster
>per-clock than several similarly-clocked OOO chips. Arguing with you about this
>is worthless.

I missed something.  Where is IA 64 _not_ faster than "similarly clocked OOO
chips?"  IE what 900mhz pentium can keep up with it?  For some programs (such
as Crafty) the pentiums clocked 3x faster can't keep up.  Someone reported 4ghz
PIV
numbers (no idea where that comes from, whether it is overclocked or what) that
were roughly in line with the 900mhz I2 chip speeds for Crafty...


>
>>Dynamic reordering is valuable when you have a few registers so you can kind've
>>sort've make use of the 40 internal registers on IA-32 chips, but IA-64 has
>>many. So what?
>
>OOO is said to increase 21264 performance by 30%. The 21264, BTW, has 32
>registers and 40 reorder registers.

I wouldn't doubt that.  however, the gain is partially the result of a compiler
not
doing very hot in the instruction scheduling area.  The sloppier it is there,
the better
the OOO architecture will do since it is simply stepping in for the compiler and
doing
what should have been done already.  Comparing a _real_ good scheduling compiler
vs an OOO architecture should see the compiler win every time, if the compiler
is
good.  It makes perfect sense to let the compiler do the scheduling once, when
the
cost is irrelevent, rather than having the processor do it every time it fetches
the
same instruction stream...

This was an issue on the Cray, for example, and their compilers were _very_ good
at
instruction scheduling as there was _no_ OOO execution whatsoever.



>
>>Yes. It appears I was looking at a 32-bit Sparc machine. I was reading a paper
>
>Have any 32 bit SPARCs been made since 1995?
>

Hard to classify a sparc.  More like the athlon 64 bit hybrid than the IA64,
IMHO.
Although the performance is so bad as to make it worthless for any
high-performance
applications.  Even the Sun O/S (solaris) has a mixture of 32 and 64 bit stuff
all over
the place, with most applications being 32 bit.






>>It seems the SPEC scores are generally higher on chips with more cache, and the
>>only McKinley score listed has a 1.5 MB L3 cache.
>
>I can't seem to access SPEC scores right now, but what's the point of a
>super-awesome post-RISC ISA if it's just going to get beat by chips with more
>cache? And if cache really is the limiting factor in McKinley's performance
>here, it must be idle a significant amount of time, which reduces IPC and means
>HT would be beneficial.
>
>>Again, I have no actual experience with an IA-64 machine because they're rather
>>expensive. I can only rely on what I've read. I have never read anything about
>>low IPC on IA-64. Please offer some evidence/article.
>
>It can still be relatively high and benefit from HT.
>
>>In compiler-generated code, my Athlon tends to retire closer to 2 instructions
>>per clock. I would assume that McKinley does better. The restrictions really
>
>Which tool are you using to measure that?
>
>>>>ignoring the Intel C exception of using scalar SSE -- not useful to chess
>>>>programs, not very good justification of SSE either when they could have
>>>>introduced new flat-register FP instructions.)
>>Original SSE is flat-register FP. SSE 2 allows double-precision FP computation.
>
>How do you make these two statements agree?
>
>-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.