Author: Matt Taylor
Date: 12:10:53 01/20/03
Go up one level in this thread
On January 20, 2003 at 14:45:45, Gerd Isenberg wrote: >On January 19, 2003 at 14:01:56, Matt Taylor wrote: > >>Interesting to note that several of those routines rely on technically undefined >>behavior. Under the bsf instruction, the manual states that, "...If the contents >>of the source operand are 0, the contents of the destination operand are >>undefined." Conveniently it seems that this works on all existing >>implementations. >> >>A similar trick can be used with shifts. Integer shift instructions mask their >>shift count to avoid unnecessary work. As a result, shifting by 32 does not >>change the destination operand. >> >>I probably won't optimize your code for Pentium 4. I was rather annoyed when >>some code I wrote executed about as fast on my Pentium 90 as it would on a >>high-end Pentium 4. All the old tricks are now expensive. Shifting is 4 clocks >>latency. The full adder (adc/sbb) is 2-3 clocks -throughput-. Latency is 6-8 >>clocks. The setcc instruction is 5 clocks latency. Every one of these >>instructions has a latency of 1 on Athlon and the original Pentium. They all >>execute with a throughput of up to 3 instructions per cycle (1/3) on Athlon and >>2 instructions per cycle (1/2) on original Pentium. Sigh. >> >>I'll optimize it for Athlon since I am now most familiar with its rules, and I >>have tools to analyze the code. Taking a look now... >> >>-Matt > <snip code> >This works fine so far with MSC6.0. But if i try to use __fastcall, to force >parameter passing via register (first is ecx by convention which would be rather >fine here), the following function succs in release mode. > <snip code> >I found no way so far, to force the compiler with inlined asm-routines, to pass >a parameter via ecx-register. Same for the asm bsf-routines and others. I always >have not necessary store/loads in the begginning of those functions. > <snip code> >Any hint obout this? > >Thanks in advance, >Gerd I've never found a way around this limitation. My best answer is to say that the loads/stores can be optimized out by a post-optimizer. I have tools which would enable me to easily write such a program, but the tools are only half-working. I need the ability to reverse engineer an executable file -- right now I miss some sections of code completely, and finding them does not look like an easy task. Once that is done, it would be trivial to inline functions and do some simple load/store optimization. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.