Author: Robert Hyatt
Date: 17:12:16 07/22/03
Go up one level in this thread
On July 22, 2003 at 20:01:32, Tom Kerrigan wrote: >On July 22, 2003 at 18:31:51, Robert Hyatt wrote: > >>On July 22, 2003 at 17:41:55, Tom Kerrigan wrote: >> >>>On July 22, 2003 at 15:27:10, Robert Hyatt wrote: >>> >>>>On July 22, 2003 at 15:02:46, Tom Kerrigan wrote: >>>> >>>>>On July 22, 2003 at 14:18:23, Robert Hyatt wrote: >>>>> >>>>>>>Of course, this is contrary to the point of a conditional move instruction. My >>>>>>>only comment to that is that Intel must have decided to add the conditional move >>>>>>>after they were done designing the relevant parts of the core. The decision to >>>>>>>add the instruction makes sense for forward-compatibility, i.e., "use this >>>>>>>instruction and you will see a performance improvement with it on later >>>>>>>processors." >>>>>> >>>>>>That could be. However, the idea was not new. The alpha did this 10+ years >>>>>>ago. So the advantage to a real CMOV implementation should be real. >>>>> >>>>>Did I ever say it was new? >>>> >>>>I'll play the game. >>>> >>>>Did I _say_ you said it was new?? >>> >>>No, but that's what I read into the "However" part of "However, the idea was not >>>new." What in the world were you Howevering if you didn't think I thought it was >>>new? (And BTW, it was new, for the x86.) >>> >>>>> Did I say that Intel's implementation is ideal? >>>> >>>>Did I say you said it was ideal? >>> >>>Similar argument. Another "however." >> >>It is cold outside, however I don't like it myself. >> >>What does that "however" have to do with implying you said it was >>cold, _or_ you didn't like the cold? >> >>It is just a conjunction to join two sentences together, one of which >>modifies the idea proposed in the other. >> >> >> >>> >>>>If Intel implemented it poorly, when it had been implemented _correctly_ 12 >>>>years previously, then I hardly think "I am being an ass for criticizing them >>>>for doing it _badly_." >>>> >>>>Had they left it out, I _would_ have had complaints, because the alpha has >>>>an instruction that directly addresses a common operation in C, the conditional >>>>assignment operator. >>>> >>>>So I feel perfectly sane in complaining if they omit something that has been >>>>around long enough for them to include it, or if they include something but >>>>implement it poorly so that it doesn't do what the "concept" suggests it >>>>does, namely eliminate branch mis-predictions by eliminating branches. It's >>>>not hard to do this in hardware. It's been done more than once already. >>> >>>Oh, come now. You can't seriously mean this. I'll do you one better--ARM had a >>>fully predicated ISA in '83, so Intel is a bunch of idiots for not adding full >>>predication to the 386. (Well, DEC too, for that matter, because they only have >>>cmove.) >>> >>>-Tom >> >>I'm not hyped on predication. CMOV does _not_ have to be implemented in that >>way. IE it isn't on the alpha. It just moves one of two values into a > >In what way? In the predication way? All predication means is that instructions >are executed according to a predicate, i.e., CONDITIONALLY. There is no >associated implementation, although it's obvious how it would be useful in terms >of speculative execution. And cmov DOES have to be implemented in the >"predication" way because that's exactly what cmov means--"conditional move." > I'm again talking about a classic "predication" architecture. CMOV does not require such a design. It is a simple gate decision based on a zero/one that is produced somewhere. ("is" means "can be" of course, depending on how it is implemented.) >>destination, depending on some logical result. In the hardware I can think of >>a _trivial_ way to make this happen. IE a 2-1 demultiplexor. > >Sure, if you don't bother to think it through to any degree at all. Demux of >what? It seems entirely possible that Intel would have had to widen/change the >format of microops in the P6 to handle both a condition and an operand. That >would have obviously been a TREMENDOUS change to the core. In a typical architecture, there are three data paths. Two source operands and a destination operand. Most instructions pipe two values in on the source busses, give the ALU some sort of N-bit opcode to specify the particular operation to do on those two operands, and then the result is gated to the output bus. For a CMOV, rather than operating on _both_ busses, you choose one and gate the output to the appropriate destination, ignoring the other. > >Imagine you're Intel and you've already designed and tested most of the P6 >datapaths (most of which shuttle microops of a certain format around) and you >find out that the next version of the x86 core will be able to do conditional >moves. What are you going to do? Make the P6 more compatible with the P4 by >implementing the instruction as a simple rule in the decoder, or go and change >the format of your microops, which would force you to touch the core all over >and completely redo all your testing and validation? I'm sure Intel really >wishes you were there to let them know they should have done the latter. You are _assuming_ that was their reasoning. You _may_ be right. But, there _might_ be other reasons. IE, as I said, CMOV _does_ appear to be a bit faster. > >>So it's "been there, done that, got the T-shirt" for several vendors. If Intel >>did it badly, they just did it badly. However, it is hard to explain how it >>appears to be faster than a traditional compare and jump... Yet it was when I >>did the assembly testing. > >Well, I could be wrong about the whole thing. I've read threads where people >were bitter about cmov running so slowly on P6s despite being fast on Athlons >and P4s but I've never tested it myself so I can't say anything intelligent >about your tests. I'm just repeating some information that I heard somewhere >else, take it or leave it. > >-Tom I simply tested it. I decided to write a "branchless" FirstOne() function a while back and posted it here. Not because I thought it would be a lot faster (it wasn't) but because I thought it would be illustrative of what a CMOV can do for code. The drawback was that it _always_ did two BSF/BSR instructions. On the PIV they are not particularly fast although they are far better than on the old pentium-1. The plus was that there were potentially _no_ mispredictions. I simply assumed that when Intel did it, they did it reasonably, because others had already done it and done it right.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.