Author: Eugene Nalimov
Date: 17:13:04 11/22/05
Go up one level in this thread
On November 22, 2005 at 19:07:16, Dieter Buerssner wrote: >On November 21, 2005 at 20:00:24, Eugene Nalimov wrote: > >>On November 21, 2005 at 18:10:54, Dieter Buerssner wrote: >> >>>[...] >>>I guess, you mean this as a substitution for >>> if (depth < 0) >>> fm = fm1; >>> else >>> fm = fm2; >>> >>>I am surprised, that compilers are not able to do this themselves. I >> >>I several times tried to modify Visual C to recognize additional cases where we >>should emit conditional moves (last time was probably a year ago for >>x64-targeting compiler). Every time I could demonstrate win on a small >>artificial test case, but every large real world program either showed no gain >>or slowed down. >> >>I suspect there are several reasons for this: >>* branch predictors are good, and majority of branches can be correctly >>predicted >>* CMOV is long instruction; short branch is shorter, so program with less CMOVs >>fits better into cache >>* there is no 8-bit form of CMOV >>* >* for invalid address "CMOV reg, memory" will give you access violation even if >>condition is false. > >I don't really understand the last reason. > >It might be possible, to detect (guess) cases, that are the real inner loops. >That should practically get rid of "CMOV is long instruction; short branch is >shorter, so program with less CMOVs fits better into cache". For lot of programs there are no inner loops where program spends majority of its time. Examples are operating systems, databases, compilers, office applications, etc. For such programs profile is flat -- i.e. you don't have 3 functions that collectively use (say) 70% of total execution time. Instead you have 150 "hot" functions, where hottest takes less than 2% of total time, and majority take 0.5-1%. There are no hot loops -- typical number of iterations is less than 5. That is very compiler-unfriendly situation; you cannot just generate locally optimal code ignoring its size. For such cases best approach is to generate reasonable (though not locally fastest) code, and carefully try to generate smallest possible code. You have to carefuly limit optimizations increasing code size such as loop unrolling, inlining, etc. That is exactly situation where Visual C shines -- our main customer is Microsoft itself. Windows, IE, MS Office, MS SQL Server, etc. are all such applications. I heard one anecdotal evidence about customer who was unhappy with code we generates for their server application, so they spend lot of resources migrating to the compiler provided by (famous) CPU vendor. Resulting executable run much slower than executable produced by Visual C; difference was tens of percents. Generating CMOVs for such applications can grow code size by (say) 1-2%. It happens that less branch mispredicts does not compensate for more frequent I-cache misses. (The most pathological case I analyzed was one of the SpecInt2k benchmarks. Hottest function in the program is huge recursive function. It containes one loop. That loop contains switch statement, and there are lot of recursive calls inside that switch. Average number of loop iterations is less than 2. As a result, the more code you hoist out of the loop, the slower program becomes -- function prologue is (almost) hottest code in the function). >About "there is no 8-bit form of CMOV". If that really hurts, it should be >rather easy, to only use them in the other cases? That is exactly what compilers are doing. I was just pointing that there are omissions making CMOVs less useful. >Thanks for your interesting input, >Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.