Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: IA-64 vs OOOE (attn Taylor, Hyatt)

Author: Robert Hyatt

Date: 12:37:05 02/12/03

Go up one level in this thread


On February 12, 2003 at 14:54:37, Matt Taylor wrote:

>On February 12, 2003 at 11:35:46, Robert Hyatt wrote:
>
>>On February 12, 2003 at 03:13:27, Matt Taylor wrote:
>>
>>>On February 12, 2003 at 00:23:53, Robert Hyatt wrote:
>>>
>>>>On February 11, 2003 at 23:27:04, Tom Kerrigan wrote:
>>>>
>>>>>On February 11, 2003 at 23:11:09, Charles Roberson wrote:
>>>>>
>>>>>>
>>>>>>  Out-of-order execution is nothing more than the ability to execute
>>>>>>instructions in an order different from the serial order in the code.
>>>>>>It has nothing to do with branching, but it enables other branching techniques.
>>>>>>OOOE is simply:
>>>>>>   1) the code has instructions a,b,c,d, in that order
>>>>>>   2) if there are no serial dependencies then they can be executed in the
>>>>>>       b,d,c,a order.
>>>>>>
>>>>>>    That is all OOOE is.
>>>>>
>>>>>I don't see how this is different from what I said. Branches are instructions
>>>>>too.
>>>>>
>>>>>-Tom
>>>>
>>>>
>>>>What he is saying is that whatever the hardware can do with OOO execution,
>>>>the compiler can replicate it by massaging the instruction stream with well-
>>>>known optimization tricks.  With the sole exception of register renaming.
>>>>
>>>>The reason OOO execution works so well on Intel is _solely_ based on the
>>>>fact that the architecture has almost no registers.  And renaming lets the
>>>>hardware expand that number of registers _significantly_ so that the
>>>>architecture can do things that other less-register-challenged architectures
>>>>can do without OOO execution as a crutch...
>>>>
>>>>IE I can show you code for the Cray that executes an instruction every cycle
>>>>that an instruction can execute, yet it is a serial-order execution processor
>>>>from the ground-up, but with help from a _really_ good instruction scheduler
>>>>pass after the final object code has been generated...  This scheduler can
>>>>replicate/hoist instructions as needed to back them up to the point that their
>>>>result is ready the cycle it is needed...
>>>
>>>Some of my bitscan code for the Athlon executed a useful instruction in every
>>>slot -- 3 IPC in 15-20 cycles of code. The sole enabling factor was the fact
>>>that I moved instructions everywhere. It was a nightmare to debug when I
>>>accidentally moved instructions in front of their dependencies.
>>>
>>>One of the biggest gains I had was moving register loads a fair number of cycles
>>>backward when I had free slots. This is difficult on IA-32 for obvious reasons,
>>>but it works very well when you have a larger number of registers.
>>>
>>>-Matt
>>
>>
>>Right (to your last paragraph).  And it is a major reason why OOOE is worthwhile
>>on
>>the X86, because the hardware can rename the registers, lift the instructions
>>and execute
>>them earlier so that by the time the data is needed, it is available in a
>>"hidden" register
>>that suddenly has the needed data at the right time.
>>
>>No way to do that in a compiler.  Unless the architecture has enough registers
>>so that
>>you don't run out with all the pre-loads.
>
>I can think of few functions that have 128 intermediate computations, even if
>they do heavy computation. The compiler often won't need to throw away any
>intermediates. Matrix manipulations and quaternion math come foremost to mind as
>good examples. Spaghetti code wouldn't have too much trouble in creating poor
>performance, but that also has an obvious rebuttle.
>
>-Matt


Right, but on the Cray T90, for example, a memory read takes 50 cycles.  So I
need to
"hoist" the memory load back up the I-stream 50 instructions at least.  Which
means that
for the next 50 instructions, that register is "off limits" for use since it
already has a read
scheduled and pending.  When you have that kind of delay, you can tie up a bunch
of
registers for a long period of time...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.