Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: assembler vs. C

Author: leonid
Date: 19:55:00 11/10/99
On November 09, 1999 at 16:38:27, Bo Persson wrote:

>On November 08, 1999 at 17:36:10, Bruce Moreland wrote:
>
>>On November 08, 1999 at 14:55:47, Ratko V Tomic wrote:
>>
>>>> I agree that a good compiler (like MSVC) produces code on par with a good
>>>> assembly coder.
>>>
>>>Not even close. Namely, what you were comparing was:
>>>
>>>> I too have studied the code listings and tried to improve
>>>> it. Generally, there are only one or two instructions that can be tweeked,
>>>> and the speed up have never been large enough to be really measureable.
>>>
>>>That is not assembly coding (much less a good assembly coding) but trying to
>>>hand-simulate a C compiler (and as you noted, that's mostly a waste of time).
>>>
>>>The native assembly program (not a tweaked C compiler output) is for any
>>>resonably complex data manipulation (i.e. for other than just copying arguments
>>>to struct fields or pushing arguments and calling a function or other simple
>>>stuff) at least couple times faster than a compiler's output. In my consulting
>>>work I had over years optimized many times existent C/C++ code (for graphics,
>>>compression, encryption, search) by switching to the native assembly language
>>>algorithms in the critical portions of the task, ending up with 3-5 times faster
>>>critical sections.
>>>
>>>When you tailor the algorithm to the CPU architecture, you can use it much
>>>better, knowing well its strenghts and its limitations, than if you tailor it to
>>>a virtual C architecture and the compiler mechanically translates it to the
>>>actual CPU model.
>>
>>It is harder for chess since the programs tend to be very very heavily optimized
>>already, and you don't tend to find particularly bad bottlenecks.
>>
>>When I optimize performance I try to find the one function that is consuming all
>>of the time.  It is very hard when you have 20 function, each of which is
>>consuming 5% of the time, and some guy has spent a week fiddling with each one
>>trying to make it faster already.
>>
>>bruce
>
>I agree totally  :-)
>
>In my case I start with tuned C++, where I have checked the code produced by the
>compiler and tweeked the C++ code until I am satisfied with the resulting code.
>This includes inlined functions producing 2-4 x86 instructions each. No call
>overhead, no nothing.
>
>I have seen cases where MSVC does bitboard operations entirely in registers, by
>first inlining the operations and then doing register allocation. It can AND two
>bitboard together and extract the bits without storing anything to memory. The
>code look extremely compact and fast, I cannot even imagine making it 10x
>faster.
>
>To leonid: if my inlined functions result in 4 or less x86 instructions each,
>how are you ever going to make it 10 times faster?
>
>
>Bo Persson
>bop@malmo.mail.telia.com

If you have factor 4 it is good enough.

Leonid.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.