Author: Matt Taylor
Date: 13:43:31 12/19/02
Go up one level in this thread
On December 19, 2002 at 16:26:27, Eugene Nalimov wrote: >In Visual C you can mark function as "__forceinline". Compiler will not apply >its heuristics, it will always try to inline the function. > >Thanks, >Eugene In VC 5 and 6? What about the /Ob2 option? My biggest gripe was and is that VC has no way of optimizing register allocation for inline assembly. My second biggest gripe is that it really doesn't optimize the inline assembly. Seems paradoxial as the purpose of inline assembly is to hand-optimize, but I can't write correct spin loops or many other constructs in plain C. I can't write MMX/SSE code in plain C without the Intel intrinsic library. I don't think VC 7 even supports the Intel intrinsics, but you would know better than I would. I think VC's mechanism here is easier to use, but it's much more difficult to optimize. I chose a sort of mix. I declare "register variables" inside an assembly block. You can map them on to variables inside the function, but otherwise they simply map as temporary registers. You tell the compiler which registers it can pick, and it will attempt to optimize register reloads across your inline assembly. -Matt >On December 19, 2002 at 16:16:30, Matt Taylor wrote: > >>On December 19, 2002 at 11:40:11, Robert Hyatt wrote: >> >>>For those remembering the discussion from a couple of weeks ago, I >>>had run into a strange problem with getting an inline asm lock to >>>work. I was playing with this because I was following the Intel >>>guideline of adding a "pause" to the "shadow-lock" part of the code. >>> >>>First, the lock now works. The bug was on the **** line above. I >>>had incorrectly written "jz". To explain the code first, read on... >> >>Ah, you had us all hoping for more elusive compiler/assembler/hardware bugs. I >>guess it always helps to check the algorithm. >> >>>Why is that interesting? It suggests that for the most part, when this code >>>is called, the lock is zero. Which is what I had thought all along with just >>>four threads running. This means that the _locks_ are not really affecting >>>my search speed, contrary to what "some" would like to suggest. If I were to >>>use 16 processors, I'm sure it would happen more often. But for now, the locks >>>do not appear to be a performance bottleneck. >>> >>>BTW the above code works fine if you are running linux. Or using gcc/gas on >>>other platforms. It is not quite microsoft syntax as MS reverses the >>>operands to dst, src rather than the ATT approach of src, dst. Also there >>>are other syntactical issues dealing with [] vs () and $constant and so forth. >> >>You mean Intel syntax. Intel deserves sole credit for their backward syntax. >>Microsoft just mimiced. :-P >> >>>I think I am next going to work on making the other asm all work by inlining >>>it to dump a lot of call instruction overhead that is scattered around... >> >>It's too bad the compiler can't make those decisions for you. My past >>experiences with the VC optimizer and inline assembly, particularly with inline >>functions, have been less than peachy. The gcc mechanism works a lot better. >>Still, it would be yet nicer if you could tell the linker, "Inline any function >>smaller than X bytes. Also, inline function Y because it gets called often." >> >>The only thing that could beat that would be a linker that employed even MORE >>intelligence and did profiling too. >> >>-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.