Author: leonid
Date: 19:55:00 11/10/99
Go up one level in this thread
On November 09, 1999 at 16:38:27, Bo Persson wrote: >On November 08, 1999 at 17:36:10, Bruce Moreland wrote: > >>On November 08, 1999 at 14:55:47, Ratko V Tomic wrote: >> >>>> I agree that a good compiler (like MSVC) produces code on par with a good >>>> assembly coder. >>> >>>Not even close. Namely, what you were comparing was: >>> >>>> I too have studied the code listings and tried to improve >>>> it. Generally, there are only one or two instructions that can be tweeked, >>>> and the speed up have never been large enough to be really measureable. >>> >>>That is not assembly coding (much less a good assembly coding) but trying to >>>hand-simulate a C compiler (and as you noted, that's mostly a waste of time). >>> >>>The native assembly program (not a tweaked C compiler output) is for any >>>resonably complex data manipulation (i.e. for other than just copying arguments >>>to struct fields or pushing arguments and calling a function or other simple >>>stuff) at least couple times faster than a compiler's output. In my consulting >>>work I had over years optimized many times existent C/C++ code (for graphics, >>>compression, encryption, search) by switching to the native assembly language >>>algorithms in the critical portions of the task, ending up with 3-5 times faster >>>critical sections. >>> >>>When you tailor the algorithm to the CPU architecture, you can use it much >>>better, knowing well its strenghts and its limitations, than if you tailor it to >>>a virtual C architecture and the compiler mechanically translates it to the >>>actual CPU model. >> >>It is harder for chess since the programs tend to be very very heavily optimized >>already, and you don't tend to find particularly bad bottlenecks. >> >>When I optimize performance I try to find the one function that is consuming all >>of the time. It is very hard when you have 20 function, each of which is >>consuming 5% of the time, and some guy has spent a week fiddling with each one >>trying to make it faster already. >> >>bruce > >I agree totally :-) > >In my case I start with tuned C++, where I have checked the code produced by the >compiler and tweeked the C++ code until I am satisfied with the resulting code. >This includes inlined functions producing 2-4 x86 instructions each. No call >overhead, no nothing. > >I have seen cases where MSVC does bitboard operations entirely in registers, by >first inlining the operations and then doing register allocation. It can AND two >bitboard together and extract the bits without storing anything to memory. The >code look extremely compact and fast, I cannot even imagine making it 10x >faster. > >To leonid: if my inlined functions result in 4 or less x86 instructions each, >how are you ever going to make it 10 times faster? > > >Bo Persson >bop@malmo.mail.telia.com If you have factor 4 it is good enough. Leonid.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.