Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: IA-64 vs OOOE (attn Taylor, Hyatt)

Author: Robert Hyatt

Date: 17:59:21 02/12/03

Go up one level in this thread


On February 12, 2003 at 16:03:21, Tom Kerrigan wrote:

>On February 12, 2003 at 11:33:24, Robert Hyatt wrote:
>
>>>Of course it's related. Compilers have to rely on static branch prediction (80%
>>>accuracy) if they're going to effectively advance instructions before
>>You keep saying this but it isn't true.  I can pull _any_ instruction and insert
>>it before
>>a branch, assuming it is architecturally feasible.  On a sparc, with 32 GP
>
>Okay, so let's say you advance instructions from both branches. Maybe you have
>enough execution resources to do that, i.e., you have enough slots to handle all
>of the ILP before and after the branch (both paths). What about the branches
>after that? It's not uncommon for Pentiums to pair instructions with 2 or 3
>branches between them. You have to run into issue contraints at some point.

Sure you reach a point of "stop and clear things out".  But that happens in
an OOOE processor also.

>
>Occam's razor. Why is every high performance processor these days OOO, except
>for IA-64 and SPARC? MIPS, POWER, Alpha, PA-RISC. None of these chips are
>register starved and there are excellent compilers for all of them (esp.
>PA-RISC) but the chip designers still saw value in going way out of their way to
>make them OOO. That's not easy. They must have seen some value in it. Do you
>think they just did it on a whim? And is it just a coincidence that the US3 is
>so slow?



As I said previously, OOOE is _another_ solution to the optimization problem.
The other solution is a better compiler that can schedule instructions better.
Either one can produce very good results without the other.  They are somewhat
complementary in some cases, but in general, when you do worse with one, the
other approach will pick up the slack.  And with the complexities of optimizing
for intel processors (each processor has quirks that are completely different
than the quirks on other processors) the compilers have a tough time.

I haven't thought much about the OOOE aspect of the alpha since they are
essentially "dead", and I quit reading what they were up to.  The point
is, generally, that OOOE can help at times, not help at others, but since
it doesn't cost anything but transistors, it is generally worth including,
for the cases where the compiler blows it.




>
>>>Indeed. It's a shame only IA-64 chips run compiled code... oh, wait...
>>Notice his point, however.  The OOOE can only execute what it can "see".  Which
>>is
>>typically a pretty narrow "peephole" into the machine language instructions.
>>Compilers
>
>I understand his point but you don't understand mine. The compiler tricks you're
>discussing also increase performance of OOO processors. I mean, OOO processors
>don't have special logic to stall on instructions that are too far apart in the
>source code.

No, but if the compiler does it perfectly, the OOO processor executes everything
_in order_.  Which is the point of the idea.  OOOE lets the processor overcome
bad instruction scheduling as produced by the compiler.  If the compiler gets
it right, OOOE means very little, unless you count X86 with the register issue.


>
>>>>No. Predication is the IA-64's answer to branch prediction. Predication is
>>>>completely unrelated to OOOE.
>>>What, exactly, do you think the point of predication is, then? It's to allow
>>>instructions to execute before the condition is determined, in other words, out
>>>of order. (Or at least in order without being dependent.) If you think
>>But they _are_ different.  Predication just says "do all of this crap and we'll
>>sort out later
>>which was crap and which was important."  A compiler can do this on an old 286,
>
>The point of predication is to eliminate dependency on a branch. How can a
>compiler do this? In other words, how can a compiler say "we'll sort out later
>which was crap and which was important" without a branch on a non-predicated
>ISA?

Easy.  Do _both_ pathways at the same time, then (say) a cmov (yes, alpha
has 'em as they came up with the idea in the first place) to pick the right
one rather than branching at all...




>
>>Slower.  But you missed the key phrase above "clock-for-clock".  Show me _any_
>>1.0ghz
>>X86 that can search (running crafty) 1.6M nodes per second.  There isn't one.
>>In fact, there
>>isn't any X86 I know of that can do that at any clock rate, yet.  My 2.8's peg
>>at 1.0-1.2M
>>nodes per second, and these are xeons with SMT on.
>
>Of course there will be some programs that run especially well on certain
>architectures. I already said this in the old thread. The G4 beats the crap out
>of x86s at SETI. Are you going to run Crafty on a Mac at the next tournament you
>enter? I'm sure it wouldn't be hard to find a chess program that runs like crap
>on IA-64. In fact, I know of one right now.
>

I wouldn't mind a G4 at all.  In fact, I tried to find a good multiple-cpu
PPC machine when I bought my first quad p6/200, but could not find anything
that was much beyond vaporware at the time.  The processor looks pretty good
to me however.




>-Tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.