Author: Robert Hyatt
Date: 08:47:21 02/12/03
Go up one level in this thread
On February 12, 2003 at 04:55:59, Tom Kerrigan wrote: >On February 12, 2003 at 00:37:13, Robert Hyatt wrote: > >>No No No. They do much of this with 100% accuracy. Because they make sure >>that the critical instructions are executed in _every_ path that reaches a >>critical point in the data-flow analysis of the program (the dependency graph >>for gcc users)... > >You're not making any sense. You have a branch. You have two possible control >paths. The instructions in each path are different. Which ones do you advance? _BOTH_. That is the point. > >>BTW OOOE has a huge limit. Something like 40-60 (I don't have my P6/etc >>manuals here at home) micro-ops in the reorder buffer. No way to do any >>OOOE beyond that very narrow peephole, while the compiler can see _much_ >>more, as much as it wants (and has the compile time) to look at... > >Alright. So run compiled code on your OOO processor. Everyone does... > >>registers when the real instructions get turned into micro-ops... but at >>least the latter is more a result of a horrible architecture (8 registers) >>as opposed to the fact the OOO execution is a huge boon for other architectures >>that are not so register-challenged... > >Funny, my 30% number was for the Alpha and MIPS chips. I wouldn't consider them >register challenged. I can't quite follow the quote above, as the last sentence is missing context. But I don't think OOOE is _nearly_ as important for decent architectures as it is for architectures that have significant design problems like X86. IE, again backing up to the cray. 64 scalar registers, 64 scalar temp registers (all you can do with temps is copy them to/from the regular registers.) That machine was not hard to keep very busy, executing one instruction per clock cycle for long periods of time (I am ignoring vector operations where it could produce multiple results in a single clock). But with that many registers, the memory delays were not noticable because you just lift the load instructions far enough back in the instruction stream that the data was there when you want it. Oh, you are "branching" to this point so how can you lift beyond the entry point? Easy if you have enough registers. Just lift the instruction back to _every_ point that branches here. Same song, second verse. If I had only had 4 GP registers, this would not have been possible, of course, which is the issue for X86, and it is the main reason why OOOE is so effective there. > >>Sure. But given the choice of OOOE with 8 int alus, or no OOOE with 16 >>int alus and an instruction package large enough to feed them all, I would >>consider the latter seriously... > >We have chips today with 9 execution units that retire, on average, one >instruction per cycle, and you think you can fill 16 in slots? Yes I do, without a doubt. Just as surely as I can use 16 cpus... > >>The Cray T932 was the last 64 bit machine they built that I used. And it >>can produce a FLOP count that no PC on the planet can come within a factor of >>10 of and that is being very generous. 2ns clock, 32 cpus, each cpu can read >>four words and write two words to memory per clock cycle, and with vector >>chaining, it can do at _least_ eight floating point operations per cycle per >>CPU. > >How many NPS does Crafty get on it? about 7M. And that was Cray Blitz, not Crafty. I have not tried to run Crafty on a Cray, because the Cray is a vector architecture and Crafty is not well-suited to that kind of machine. Cray Blitz was definitely optimized for it however, and it also used a number of bitmaps, although it was not a "full-blown" bitmap program because vectors were even faster. > >>I did a branchless FirstOne() in asm a few weeks back here, just to test. >>It used a cmov, and it wasn't slower than the one with a branch. If the > >On a Pentium III? > On a pentium IV. Although I did test it on my PIII xeon box, so I guess the answer is "yes" to the III as well... >-Tom
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.