Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: question about branchless code

Author: Uri Blass
Date: 14:26:43 02/27/03
On February 27, 2003 at 16:54:05, Ricardo Gibert wrote:

>On February 27, 2003 at 11:03:20, Uri Blass wrote:
>
>>On February 27, 2003 at 10:14:35, Matt Taylor wrote:
>>
>>>On February 27, 2003 at 07:44:19, Uri Blass wrote:
>>>
>>>>On February 27, 2003 at 05:23:25, Tony Werten wrote:
>>>>
>>>>>On February 26, 2003 at 14:04:16, Uri Blass wrote:
>>>>>
>>>>>>I had the following code in my program
>>>>>>
>>>>>>do
>>>>>>{
>>>>>>...
>>>>>>if (info[target]==EMPTA)
>>>>>>  target+=8;
>>>>>>else
>>>>>>  target=64;
>>>>>>}
>>>>>>while (target<64)
>>>>>
>>>>>Less branches:
>>>>>
>>>>>do
>>>>>{
>>>>>...
>>>>>  tmp:=(info[target]!=EMPTY);  // maybe typecast to int ?
>>>>>  target+=(tmp*256)+8;
>>>>>} while (target<64);
>>>>>
>>>>>if (target>256) target=64;
>>>>>
>>>>>
>>>>>But I'm not sure if this works in C. (In pascal compares are garantied to return
>>>>>0 or 1 )
>>>>
>>>>It works in C(except the fact that I need to use a varaible for tmp)
>>>>
>>>>I even do not need the last if (target>256) target==64 because target is not
>>>>used after the loop but I am not sure if it makes me faster.
>>>>
>>>>I have many similiar loops and by comparing times by changing one loop
>>>>I find that it is even slightly slower.
>>>>
>>>>I used a global varaible for tmp because if I use a local caraible that I need
>>>>to define at the beginning of the function then it means that I need to change
>>>>all the similiar functions in order to practically use it.
>>>>
>>>>Uri
>>>
>>>Using a variable is not necessarily a bad thing. Many variables get optimized
>>>away. Even if you use the computation as an intermediate, there is no guarantee
>>>that the compiler won't store it to memory.
>>>
>>>One common technique for optimizing variable usage is to convert code into a
>>>so-called IL ("Intermediate Language"). The IL supports an basically infinite
>>>number of registers. Each calculation you perform goes into one of these
>>>registers, and it's never modified afterward. The compiler then performs
>>>liveness analysis on the IL to determine when it can throw away computations.
>>>Other things can affect optimization (such as whether or not you take the
>>>address of the variable), but in this case and many others, you are no worse off
>>>to use a variable over an intermediate computation.
>>>
>>>If your loops are all similar, it means you should probably put one copy in a
>>>function.
>> Global variables are discouraged because they cause thread re-entry
>>>issues. (i.e. you will have many problems if you decide to take advantage of
>>>parallel processing.)
>>>
>>>Again, the speed difference you're talking about is very minimal. Concentrate on
>>>a good design first and worry about speed later. You might address speed in your
>>>design, but optimization is an iterative, topological process. First you
>>>optimize your algorithms (i.e. bitboards are faster in some cases, arrays are
>>>faster in others, etc.) Next you can optimize memory access. If you want to
>>>optimize to the clock cycle, you can then work on individual loops like this.
>>>However, you want to use a profiler to do it -- there is no sense in optimizing
>>>an infrequently executed loop when there are a dozen others that could use
>>>optimizing. Speed often goes hand-in-hand with a good design.
>>>
>>>-Matt
>>
>>I know that I can get bigger speed improvements from changing the design and
>>improving the algorithm but this process is longer than changing few lines
>>and I like to see first what I can gain by changing few lines.
>>
>>I do not think that there is a contradiction between better design and
>>these speed improvements.
>>
>>I also found some small speed improvement that I can describe as improvement of
>>the algorithm
>>
>>My initial loop was the following loop(I include also the line before the loop.
>>
>>target+=8;
>>do
>>{
>>   directsee[target]&=~131074;
>>   if (info[target]==EMPTA)
>>     target+=8;
>>   else
>>     target=64;
>>}
>>while (target<64);
>>
>>better is
>>
>>do
>>{
>>  target+=8;
>>  directsee[target]&=~131074;
>>  if (info[target]!=EMPTA)
>>    break;
>>}
>>while (target<56)
>
>
>This version is more readable:
>
>do
>{
>  directsee[target]&=~131074;
>  target+=8;
>}
>while (info[target]==EMPTA && target<64);   // note: test order affects speed
>
>
>If it can be assumed that the definition of "info[target]" is as "unsigned
>info[64]", then the way to handle this with sentinels is to extend the array as
>in "unsigned info[72]" and to set "info[i] = FAIL" for i in [64,71].

Thanks note that I already did it and info is not a defined as an array so
I have info[i]=-1 for i in [64,71] or i in [-8,-1]

It helped me in other cases but I did not think that it can help me here.

Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.