Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: cmov isn't necessarily good

Author: Robert Hyatt
Date: 17:12:16 07/22/03
On July 22, 2003 at 20:01:32, Tom Kerrigan wrote:

>On July 22, 2003 at 18:31:51, Robert Hyatt wrote:
>
>>On July 22, 2003 at 17:41:55, Tom Kerrigan wrote:
>>
>>>On July 22, 2003 at 15:27:10, Robert Hyatt wrote:
>>>
>>>>On July 22, 2003 at 15:02:46, Tom Kerrigan wrote:
>>>>
>>>>>On July 22, 2003 at 14:18:23, Robert Hyatt wrote:
>>>>>
>>>>>>>Of course, this is contrary to the point of a conditional move instruction. My
>>>>>>>only comment to that is that Intel must have decided to add the conditional move
>>>>>>>after they were done designing the relevant parts of the core. The decision to
>>>>>>>add the instruction makes sense for forward-compatibility, i.e., "use this
>>>>>>>instruction and you will see a performance improvement with it on later
>>>>>>>processors."
>>>>>>
>>>>>>That could be.  However, the idea was not new.  The alpha did this 10+ years
>>>>>>ago.  So the advantage to a real CMOV implementation should be real.
>>>>>
>>>>>Did I ever say it was new?
>>>>
>>>>I'll play the game.
>>>>
>>>>Did I _say_ you said it was new??
>>>
>>>No, but that's what I read into the "However" part of "However, the idea was not
>>>new." What in the world were you Howevering if you didn't think I thought it was
>>>new? (And BTW, it was new, for the x86.)
>>>
>>>>> Did I say that Intel's implementation is ideal?
>>>>
>>>>Did I say you said it was ideal?
>>>
>>>Similar argument. Another "however."
>>
>>It is cold outside, however I don't like it myself.
>>
>>What does that "however" have to do with implying you said it was
>>cold, _or_ you didn't like the cold?
>>
>>It is just a conjunction to join two sentences together, one of which
>>modifies the idea proposed in the other.
>>
>>
>>
>>>
>>>>If Intel implemented it poorly, when it had been implemented _correctly_ 12
>>>>years previously, then I hardly think "I am being an ass for criticizing them
>>>>for doing it _badly_."
>>>>
>>>>Had they left it out, I _would_ have had complaints, because the alpha has
>>>>an instruction that directly addresses a common operation in C, the conditional
>>>>assignment operator.
>>>>
>>>>So I feel perfectly sane in complaining if they omit something that has been
>>>>around long enough for them to include it, or if they include something but
>>>>implement it poorly so that it doesn't do what the "concept" suggests it
>>>>does, namely eliminate branch mis-predictions by eliminating branches.  It's
>>>>not hard to do this in hardware.  It's been done more than once already.
>>>
>>>Oh, come now. You can't seriously mean this. I'll do you one better--ARM had a
>>>fully predicated ISA in '83, so Intel is a bunch of idiots for not adding full
>>>predication to the 386. (Well, DEC too, for that matter, because they only have
>>>cmove.)
>>>
>>>-Tom
>>
>>I'm not hyped on predication.  CMOV does _not_ have to be implemented in that
>>way.  IE it isn't on the alpha.  It just moves one of two values into a
>
>In what way? In the predication way? All predication means is that instructions
>are executed according to a predicate, i.e., CONDITIONALLY. There is no
>associated implementation, although it's obvious how it would be useful in terms
>of speculative execution. And cmov DOES have to be implemented in the
>"predication" way because that's exactly what cmov means--"conditional move."
>

I'm again talking about a classic "predication" architecture.  CMOV does
not require such a design.  It is a simple gate decision based on a zero/one
that is produced somewhere.  ("is" means "can be" of course, depending on how
it is implemented.)

>>destination, depending on some logical result.  In the hardware I can think of
>>a _trivial_ way to make this happen.  IE a 2-1 demultiplexor.
>
>Sure, if you don't bother to think it through to any degree at all. Demux of
>what? It seems entirely possible that Intel would have had to widen/change the
>format of microops in the P6 to handle both a condition and an operand. That
>would have obviously been a TREMENDOUS change to the core.

In a typical architecture, there are three data paths.  Two source operands
and a destination operand.  Most instructions pipe two values in on the source
busses, give the ALU some sort of N-bit opcode to specify the particular
operation to do on those two operands, and then the result is gated to the
output bus.  For a CMOV, rather than operating on _both_ busses, you choose
one and gate the output to the appropriate destination, ignoring the other.






>
>Imagine you're Intel and you've already designed and tested most of the P6
>datapaths (most of which shuttle microops of a certain format around) and you
>find out that the next version of the x86 core will be able to do conditional
>moves. What are you going to do? Make the P6 more compatible with the P4 by
>implementing the instruction as a simple rule in the decoder, or go and change
>the format of your microops, which would force you to touch the core all over
>and completely redo all your testing and validation? I'm sure Intel really
>wishes you were there to let them know they should have done the latter.

You are _assuming_ that was their reasoning.  You _may_ be right.  But,
there _might_ be other reasons.  IE, as I said, CMOV _does_ appear to be a
bit faster.



>
>>So it's "been there, done that, got the T-shirt" for several vendors.  If Intel
>>did it badly, they just did it badly.  However, it is hard to explain how it
>>appears to be faster than a traditional compare and jump...  Yet it was when I
>>did the assembly testing.
>
>Well, I could be wrong about the whole thing. I've read threads where people
>were bitter about cmov running so slowly on P6s despite being fast on Athlons
>and P4s but I've never tested it myself so I can't say anything intelligent
>about your tests. I'm just repeating some information that I heard somewhere
>else, take it or leave it.
>
>-Tom

I simply tested it.  I decided to write a "branchless" FirstOne() function
a while back and posted it here.  Not because I thought it would be a lot
faster (it wasn't) but because I thought it would be illustrative of what a
CMOV can do for code.  The drawback was that it _always_ did two BSF/BSR
instructions.  On the PIV they are not particularly fast although they are
far better than on the old pentium-1.  The plus was that there were potentially
_no_ mispredictions.  I simply assumed that when Intel did it, they did it
reasonably, because others had already done it and done it right.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.