Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: question about branchless code

Author: Ricardo Gibert

Date: 13:54:05 02/27/03

Go up one level in this thread


On February 27, 2003 at 11:03:20, Uri Blass wrote:

>On February 27, 2003 at 10:14:35, Matt Taylor wrote:
>
>>On February 27, 2003 at 07:44:19, Uri Blass wrote:
>>
>>>On February 27, 2003 at 05:23:25, Tony Werten wrote:
>>>
>>>>On February 26, 2003 at 14:04:16, Uri Blass wrote:
>>>>
>>>>>I had the following code in my program
>>>>>
>>>>>do
>>>>>{
>>>>>...
>>>>>if (info[target]==EMPTA)
>>>>>  target+=8;
>>>>>else
>>>>>  target=64;
>>>>>}
>>>>>while (target<64)
>>>>
>>>>Less branches:
>>>>
>>>>do
>>>>{
>>>>...
>>>>  tmp:=(info[target]!=EMPTY);  // maybe typecast to int ?
>>>>  target+=(tmp*256)+8;
>>>>} while (target<64);
>>>>
>>>>if (target>256) target=64;
>>>>
>>>>
>>>>But I'm not sure if this works in C. (In pascal compares are garantied to return
>>>>0 or 1 )
>>>
>>>It works in C(except the fact that I need to use a varaible for tmp)
>>>
>>>I even do not need the last if (target>256) target==64 because target is not
>>>used after the loop but I am not sure if it makes me faster.
>>>
>>>I have many similiar loops and by comparing times by changing one loop
>>>I find that it is even slightly slower.
>>>
>>>I used a global varaible for tmp because if I use a local caraible that I need
>>>to define at the beginning of the function then it means that I need to change
>>>all the similiar functions in order to practically use it.
>>>
>>>Uri
>>
>>Using a variable is not necessarily a bad thing. Many variables get optimized
>>away. Even if you use the computation as an intermediate, there is no guarantee
>>that the compiler won't store it to memory.
>>
>>One common technique for optimizing variable usage is to convert code into a
>>so-called IL ("Intermediate Language"). The IL supports an basically infinite
>>number of registers. Each calculation you perform goes into one of these
>>registers, and it's never modified afterward. The compiler then performs
>>liveness analysis on the IL to determine when it can throw away computations.
>>Other things can affect optimization (such as whether or not you take the
>>address of the variable), but in this case and many others, you are no worse off
>>to use a variable over an intermediate computation.
>>
>>If your loops are all similar, it means you should probably put one copy in a
>>function.
> Global variables are discouraged because they cause thread re-entry
>>issues. (i.e. you will have many problems if you decide to take advantage of
>>parallel processing.)
>>
>>Again, the speed difference you're talking about is very minimal. Concentrate on
>>a good design first and worry about speed later. You might address speed in your
>>design, but optimization is an iterative, topological process. First you
>>optimize your algorithms (i.e. bitboards are faster in some cases, arrays are
>>faster in others, etc.) Next you can optimize memory access. If you want to
>>optimize to the clock cycle, you can then work on individual loops like this.
>>However, you want to use a profiler to do it -- there is no sense in optimizing
>>an infrequently executed loop when there are a dozen others that could use
>>optimizing. Speed often goes hand-in-hand with a good design.
>>
>>-Matt
>
>I know that I can get bigger speed improvements from changing the design and
>improving the algorithm but this process is longer than changing few lines
>and I like to see first what I can gain by changing few lines.
>
>I do not think that there is a contradiction between better design and
>these speed improvements.
>
>I also found some small speed improvement that I can describe as improvement of
>the algorithm
>
>My initial loop was the following loop(I include also the line before the loop.
>
>target+=8;
>do
>{
>   directsee[target]&=~131074;
>   if (info[target]==EMPTA)
>     target+=8;
>   else
>     target=64;
>}
>while (target<64);
>
>better is
>
>do
>{
>  target+=8;
>  directsee[target]&=~131074;
>  if (info[target]!=EMPTA)
>    break;
>}
>while (target<56)


This version is more readable:

do
{
  directsee[target]&=~131074;
  target+=8;
}
while (info[target]==EMPTA && target<64);   // note: test order affects speed


If it can be assumed that the definition of "info[target]" is as "unsigned
info[64]", then the way to handle this with sentinels is to extend the array as
in "unsigned info[72]" and to set "info[i] = FAIL" for i in [64,71]. Now you can
eliminate one of the tests as follows:

do
{
  directsee[target]&=~131074;
  target+=8;
}
while (info[target]==EMPTA);

Now the test "info[target]==EMPTA" will always fail when target >= 64 thanks to
the FAIL sentinels. This avoids the "target<64" part of the test.

Or how about the following one liner:

for (; info[target]!=EMPTA; target+=8)  directsee[target]&=~131074;


>
>I can say that it is improvement of the algorithm because I save target+=8
>that is not relevant.
>
>Note that the arrays that I used in this post are not going to be changed in the
>better design.
>
>The name EMPTA and not EMPTY was used because I have different constants for
>pieces and for information of the square.
>
>I have PAWN=0,KNIGHT=1,EMPTY=6
>
>I also have
>
>info[squarre]=
>whitepawn=0,whiteknight=1,...blackpawn=8,...
>EMPTA=22
>
>The idea is that I want to find by info[square] the piece and the color by
>simple defines so I can do it by defines like defining piece(square)
>to be info[square]&7
>
>
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.