Author: Bo Persson
Date: 10:55:04 03/15/01
Go up one level in this thread
On March 15, 2001 at 00:18:44, Pham Minh Tri wrote: >Hi all, > >In MS DOS environment, I have written some pieces of code in assembly to speed >up program and I could double speed by that way. However, when I moved my code >into Windows and VC6.0 compiler, that method of speedup did not work. I have >tried several times and measured that the program with some assembly pieces of >code could be slightly slower than one without them (pure C++, option >optimization: maximize speed). An annoying result, isn’t it? At end I gave up >with some conclusions as following: > >1) My experience of writing and optimisation assembly is 16 bit one, not good >enough for 32 bit with many new instructions. However, I little doubt about this >conclusion because I designed all data structures suitable for 16 bit (I did not >use Bitboard structure), and many 16-bit instructions are not slower than 32 > bit (and some others may be quicker). In a 32-bit environment (like Win 9x/NT/etc) instructions operating on 16-bit data *are* slower, because they need a size prefix to override the default operand size. Intel used a *trick* when they went from 16-bit code to 32-bit code: The same instructions operate on 16 or 32-bit data depending on what kind of code segment they belong to. An instruction decides if it is working on a byte or a word. The size of the word, however, is decided somewhere else (in the segment descriptor). In a 16-bit segment, instructions operate on 8 or 16 bits. In a 32-bit segment, the same instructions operate on 8 or 32 bits. So, if you use the "wrong" size word in an instruction, the compiler or the assembler(!) will add a size override prefix byte to each such instruction. This makes the instructions longer and slower and it might also reduce the efficiency of the out-of-order execution of the PII/PIII generation. > >2) MS VC6.0 compiler may do the best for optimisation of speed. It means we >could do very little more only. Yes, in most cases the compiler generates very good code. You can ask it for a list of intermixed source code and assembly and see what it does. Even beeing an old assembly programmer, I am often impressed by the way the compiler improves the instruction scheduling by doing several things in parallell by interleaving operations from several C statements. Even if you find some code that is less than optimal, you could probably fix this by tuning your inline functions and get a global speedup instead of fixing one function at a time. There is also another problem with inline assembly, in that it might interfere with the compilers own optimizations. If you, the programmer, "steals" a few registers in the middle of a function, the compiler might have to produce worse code for the rest of the function! >3) Many functions of chess, which we may convert into assembly code, are not >complicated, so that a good compiler as VC6.0 could make optimisation as good > as an expert. People could do better than program in more complicated and > graphic applications. > >4) Because of all above, if I insist, I could get a speedup of 3-5% after a > huge effort. It is much more expensive than speedup by other way. Yes, it is probably better to invest the efforts in improving the algorithms used. The only good use I have seen for assembly is to get access to the BSR/BSF instructions for finding bits in a bitmap. Even though this use makes the compiler store the values in a temporary, there is still a net gain (on Intel processors). >Just my thought and experience. Do you get better results or other experiences? >Pham Bo Persson bop@malmo.mail.telia.com
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.