Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Speedups for BitBoard programs on 64-bit machines

Author: Robert Hyatt

Date: 07:44:47 06/09/02

Go up one level in this thread


On June 09, 2002 at 02:11:20, Uri Blass wrote:

>On June 08, 2002 at 22:15:20, Robert Hyatt wrote:
>
>>On June 08, 2002 at 10:41:26, Uri Blass wrote:
>>
>>>On June 08, 2002 at 10:08:41, Eugene Nalimov wrote:
>>>
>>>>Usually "profile-based optimizations" means that *compiler* does such the
>>>>optimizations. I.e.
>>>>
>>>>(1) You are building special instrumented version of the program that when run
>>>>collects information about where the time in the code spent, as well as other
>>>>information compiler compiler can use.
>>>>
>>>>(2) You are running that instrumented program on a set of scenarios you consider
>>>>typical for your program.
>>>>
>>>>(3) You are re-compiling (or re-linking, depending on the used compiler) your
>>>>program, this time specifying "use profile data from my train run". During that
>>>>compilation compiler performs lot of new optimizations that use profile data --
>>>>code separation, function layout, basic blocks layout, more aggressive inlining,
>>>>loop unrolling, etc.
>>>>
>>>>Shipping Intel C++/Fortran have that feature. Visual C++ can do that for IA-64,
>>>>but shipped version for x86 does not include that.
>>>>
>>>>Eugene
>>>
>>>Thanks but I am afraid it is not going to help me to know how to do profile
>>>based optimizations.
>>>
>>>I guess that I need to see practical examples of profile based optimization in
>>>order to understand.
>>>
>>>I also have no idea how much speed can be earned thanks to profile based
>>>optimization.
>>>
>>>If it is not more than 10% then it means that I am not going to care about it in
>>>the near future.
>>>
>>>I use visual C++ but movei is written in pure C.
>>>
>>>Uri
>>
>>
>>It can be 10% or even more.  But the main point is that _you_ do nothing
>>except to compile with the profile option, run a good test set, then re-
>>compile telling the compiler to use the profile results to produce even faster
>>code.
>>
>>No effort at all...
>
>It still does not help me to understand how to do it.
>
>I guess that you do nothing after writing a specific code for running a good
>test suite and calculate some information but I have no idea how to do it.
>
>I think that I can learn by seeing a practical example how to get a speed
>advantage from profile-based optimization.
>
>It is possible that the right part of crafty's code that run a good test set may
>help me but I doubt if I am going to understand it because crafty is not a
>simple program.
>
>I guess that profile-based optimization can also help programs that are more
>simple to understand than Crafty so it may be better if I see how  it is done in
>a simple program.
>
>Uri


Here is what is going on:

_you_ must create some sort of test set that you believe represents "normal"
things your program might do.  IE for Crafty, I have a set of positions that
includes the starting position, some opening, middlegame and endgame positions,
and some tactical positions that win material or that force checkmates.

You compile the first time telling the compile to produce code that contains
"profiling" information.  You run this compiled program against your test set
of positions.  The profiling code included in your compile will produce lots
of data.  Including things like which functions are called _many_ times, so that
they might be inlined at the "hot-spots".  Which if-statements are true or false
most of the time so that the unused "side" of them can be moved somewhere else
so that prefetching during cache line fills will bring in instructions that are
used, not instructions that are skipped over.  Which loops might be effectively
unrolled because they are executed so often that making the program a bit bigger
will be offset by a speed gain.  And so forth.  All of this is learned while
you simply run your test position set thru the profile-enabled executable.

Then you simply re-compile everything _again_ but this time you tell the
compiler to use the results of the profile-run you just completed, to further
optimize the code, now that it _knows_ which branches are commonly true, which
are false, which procedures are called most often, from where, etc.

Typically this second compile will produce a further 10% speed improvement.

The compiler doesn't tell _you_ what to do to make the program faster.  It
simply uses the profile-output to do things more efficiently based on the input
code (program) you give it.  Yes, you might rewrite things to make them even
more efficient, but the compiler won't tell you how to do this.  It just takes
the source your give it, runs it, observes what is really going on, then
recompiles the code a second time using this information to make the resulting
machine language even more efficient...




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.