Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: assembler vs. C

Author: Bo Persson

Date: 13:38:27 11/09/99

Go up one level in this thread


On November 08, 1999 at 17:36:10, Bruce Moreland wrote:

>On November 08, 1999 at 14:55:47, Ratko V Tomic wrote:
>
>>> I agree that a good compiler (like MSVC) produces code on par with a good
>>> assembly coder.
>>
>>Not even close. Namely, what you were comparing was:
>>
>>> I too have studied the code listings and tried to improve
>>> it. Generally, there are only one or two instructions that can be tweeked,
>>> and the speed up have never been large enough to be really measureable.
>>
>>That is not assembly coding (much less a good assembly coding) but trying to
>>hand-simulate a C compiler (and as you noted, that's mostly a waste of time).
>>
>>The native assembly program (not a tweaked C compiler output) is for any
>>resonably complex data manipulation (i.e. for other than just copying arguments
>>to struct fields or pushing arguments and calling a function or other simple
>>stuff) at least couple times faster than a compiler's output. In my consulting
>>work I had over years optimized many times existent C/C++ code (for graphics,
>>compression, encryption, search) by switching to the native assembly language
>>algorithms in the critical portions of the task, ending up with 3-5 times faster
>>critical sections.
>>
>>When you tailor the algorithm to the CPU architecture, you can use it much
>>better, knowing well its strenghts and its limitations, than if you tailor it to
>>a virtual C architecture and the compiler mechanically translates it to the
>>actual CPU model.
>
>It is harder for chess since the programs tend to be very very heavily optimized
>already, and you don't tend to find particularly bad bottlenecks.
>
>When I optimize performance I try to find the one function that is consuming all
>of the time.  It is very hard when you have 20 function, each of which is
>consuming 5% of the time, and some guy has spent a week fiddling with each one
>trying to make it faster already.
>
>bruce

I agree totally  :-)

In my case I start with tuned C++, where I have checked the code produced by the
compiler and tweeked the C++ code until I am satisfied with the resulting code.
This includes inlined functions producing 2-4 x86 instructions each. No call
overhead, no nothing.

I have seen cases where MSVC does bitboard operations entirely in registers, by
first inlining the operations and then doing register allocation. It can AND two
bitboard together and extract the bits without storing anything to memory. The
code look extremely compact and fast, I cannot even imagine making it 10x
faster.

To leonid: if my inlined functions result in 4 or less x86 instructions each,
how are you ever going to make it 10 times faster?


Bo Persson
bop@malmo.mail.telia.com





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.