Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: IA-64 vs OOOE (attn Taylor, Hyatt)

Author: Tom Kerrigan

Date: 01:55:59 02/12/03

Go up one level in this thread


On February 12, 2003 at 00:37:13, Robert Hyatt wrote:

>No No No.  They do much of this with 100% accuracy.  Because they make sure
>that the critical instructions are executed in _every_ path that reaches a
>critical point in the data-flow analysis of the program (the dependency graph
>for gcc users)...

You're not making any sense. You have a branch. You have two possible control
paths. The instructions in each path are different. Which ones do you advance?

>BTW OOOE has a huge limit.  Something like 40-60 (I don't have my P6/etc
>manuals here at home) micro-ops in the reorder buffer.  No way to do any
>OOOE beyond that very narrow peephole, while the compiler can see _much_
>more, as much as it wants (and has the compile time) to look at...

Alright. So run compiled code on your OOO processor.

>registers when the real instructions get turned into micro-ops...  but at
>least the latter is more a result of a horrible architecture (8 registers)
>as opposed to the fact the OOO execution is a huge boon for other architectures
>that are not so register-challenged...

Funny, my 30% number was for the Alpha and MIPS chips. I wouldn't consider them
register challenged.

>Sure.  But given the choice of OOOE with 8 int alus, or no OOOE with 16
>int alus and an instruction package large enough to feed them all, I would
>consider the latter seriously...

We have chips today with 9 execution units that retire, on average, one
instruction per cycle, and you think you can fill 16 in slots?

>The Cray T932 was the last 64 bit machine they built that I used.  And it
>can produce a FLOP count that no PC on the planet can come within a factor of
>10 of and that is being very generous.  2ns clock, 32 cpus, each cpu can read
>four words and write two words to memory per clock cycle, and with vector
>chaining, it can do at _least_ eight floating point operations per cycle per
>CPU.

How many NPS does Crafty get on it?

>I did a branchless FirstOne() in asm a few weeks back here, just to test.
>It used a cmov, and it wasn't slower than the one with a branch.  If the

On a Pentium III?

-Tom



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.