Author: Robert Hyatt
Date: 11:18:23 07/22/03
Go up one level in this thread
On July 21, 2003 at 19:15:26, Tom Kerrigan wrote: >On July 20, 2003 at 14:48:09, Robert Hyatt wrote: > >>On July 19, 2003 at 02:11:34, Tom Kerrigan wrote: >> >>>On July 19, 2003 at 01:11:31, Robert Hyatt wrote: >>> >>>>On July 18, 2003 at 15:16:27, Tom Kerrigan wrote: >>>> >>>>>On July 18, 2003 at 04:05:52, Walter Faxon wrote: >>>>> >>>>>>>; 326 : if (bbHalf) bb0 = bb1; // will code as cmov (ideally) >>>>>>> >>>>>>> test ecx, ecx >>>>>>> je SHORT $L806 >>>>>>> mov eax, DWORD PTR _bb$[esp] >>>>>>>$L806: >>>>>>> >>>>>> >>>>>> >>>>>>Stupid compiler, not only no cmov >>>>> >>>>>IIRC, on the P6 (Pentium Pro, Pentium II, Pentium III), the cmov instruction >>>>>gets translated into a string of uOps that's equivalent to testing, branching, >>>>>and copying. >>>>> >>>>>In other words, there is no performance benefit (I believe there may actually be >>>>>a performance penalty) to using cmov on a P6, and it breaks compatibility with >>>>>pre-P6 processors, so it's little wonder the P6-era MS compiler doesn't generate >>>>>cmovs. >>>>> >>>>>-Tom >>>> >>>> >>>>I think the point is that the cmov eliminates any possibility of a branch >>>>mis-prediction. On the long PIV pipeline, that's a significant savings for >>>>mis-predicted branches. >>>> >>>>Since Eugene's example shows that the new MSVC compiler is going to finally >>>>emit cmov instructions, I'd assume there is a performance gain for doing >>>>so. >>> >>>Yes, of course, I thought I had made it perfectly clear that I was talking about >>>the _P6_ core. I wrote all of them out. Pentium Pro, Pentium II, Pentium III. >>>_Not_ Pentium 4. >>> >>>-Tom >> >>I don't see why it would be worse on a P6 core either. IE on a P6, if the >>branch is mis-predicted, you _still_ have to back out all the stuff that has >>been speculatively executed, including any out-of-order stuff as well. The >>CMOV eliminates a lot of that. > >I'm sorry, but can you read at all? This is astounding. > >The only point that my original post conveys is that for the _P6_ core the cmov >instruction gets translated into _branch_ and copy uOps. I understood that.. > >You've already managed to miss one key point, namely that I was talking about >the P6 code. You write a big post about the Pentium 4, which my post was >obviously not addressing. > The point is the same, except the PIV pipe line is longer. But there is a big branch mis-prediction penalty on all the machines, with the PIV having a _bigger_ penalty. I don't see what that changes. >Now it seems like you've missed the other key point, namely that cmov produces a >branch uOp, so unless I'm being especially dense, it CAN be mispredicted, just >like any other branch. OK. _if_ it _really_ produces a full branch uop, then you are correct. That seems like an ultra-stupid way of implementing CMOV, however. And, in fact, it seems like a completely futile way of doing it, since there is no advantage at all. But in testing, on the tests I have _personally_ run, using hand-coded assembler, cmov has always been marginally faster. Never significantly faster. And never slower. Of course, that is just for hand-coded tests, as opposed to a huge function where the OOE stuff can run amok. > >Of course, this is contrary to the point of a conditional move instruction. My >only comment to that is that Intel must have decided to add the conditional move >after they were done designing the relevant parts of the core. The decision to >add the instruction makes sense for forward-compatibility, i.e., "use this >instruction and you will see a performance improvement with it on later >processors." That could be. However, the idea was not new. The alpha did this 10+ years ago. So the advantage to a real CMOV implementation should be real. > >-Tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.