Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Assembly in VC6.0 – an (bad) experience!

Author: Bo Persson
Date: 10:55:04 03/15/01
On March 15, 2001 at 00:18:44, Pham Minh Tri wrote:

>Hi all,
>
>In MS DOS environment, I have written some pieces of code in assembly to speed
>up program and I could double speed by that way. However, when I moved my code
>into Windows and VC6.0 compiler, that method of speedup did not work. I have
>tried several times and measured that the program with some assembly pieces of
>code could be slightly slower than one without them (pure C++, option
>optimization: maximize speed). An annoying result, isn’t it? At end I gave up
>with some conclusions as following:
>
>1) My experience of writing and optimisation assembly is 16 bit one, not good
>enough for 32 bit with many new instructions. However, I little doubt about this
>conclusion because I designed all data structures suitable for 16 bit (I did not
>use Bitboard structure), and many 16-bit instructions are not slower than 32
> bit (and some others may be quicker).

In a 32-bit environment (like Win 9x/NT/etc) instructions operating on 16-bit
data *are* slower, because they need a size prefix to override the default
operand size.

Intel used a *trick* when they went from 16-bit code to 32-bit code: The same
instructions operate on 16 or 32-bit data depending on what kind of code segment
they belong to. An instruction decides if it is working on a byte or a word. The
size of the word, however, is decided somewhere else (in the segment
descriptor). In a 16-bit segment, instructions operate on 8 or 16 bits. In a
32-bit segment, the same instructions operate on 8 or 32 bits.

So, if you use the "wrong" size word in an instruction, the compiler or the
assembler(!) will add a size override prefix byte to each such instruction. This
makes the instructions longer and slower and it might also reduce the efficiency
of the out-of-order execution of the PII/PIII generation.

>
>2) MS VC6.0 compiler may do the best for optimisation of speed. It means we
>could do very little more only.

Yes, in most cases the compiler generates very good code. You can ask it for a
list of intermixed source code and assembly and see what it does. Even beeing an
old assembly programmer, I am often impressed by the way the compiler improves
the instruction scheduling by doing several things in parallell by interleaving
operations from several C statements.

Even if you find some code that is less than optimal, you could probably fix
this by tuning your inline functions and get a global speedup instead of fixing
one function at a time.

There is also another problem with inline assembly, in that it might interfere
with the compilers own optimizations. If you, the programmer, "steals" a few
registers in the middle of a function, the compiler might have to produce worse
code for the rest of the function!

>3) Many functions of chess, which we may convert into assembly code, are not
>complicated, so that a good compiler as VC6.0 could make optimisation as good
> as an expert. People could do better than program in more complicated and
> graphic applications.
>
>4) Because of all above, if I insist, I could get a speedup of 3-5% after a
> huge effort. It is much more expensive than speedup by other way.

Yes, it is probably better to invest the efforts in improving the algorithms
used.

The only good use I have seen for assembly is to get access to the BSR/BSF
instructions for finding bits in a bitmap. Even though this use makes the
compiler store the values in a temporary, there is still a net gain (on Intel
processors).

>Just my thought and experience. Do you get better results or other experiences?
>Pham


Bo Persson
bop@malmo.mail.telia.com
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.