Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: question about branchless code

Author: Uri Blass
Date: 08:06:20 02/28/03
On February 28, 2003 at 10:58:21, Matt Taylor wrote:

>On February 27, 2003 at 11:03:20, Uri Blass wrote:
>
>>On February 27, 2003 at 10:14:35, Matt Taylor wrote:
>>
>>>On February 27, 2003 at 07:44:19, Uri Blass wrote:
>>>
>>>>On February 27, 2003 at 05:23:25, Tony Werten wrote:
>>>>
>>>>>On February 26, 2003 at 14:04:16, Uri Blass wrote:
>>>>>
>>>>>>I had the following code in my program
>>>>>>
>>>>>>do
>>>>>>{
>>>>>>...
>>>>>>if (info[target]==EMPTA)
>>>>>>  target+=8;
>>>>>>else
>>>>>>  target=64;
>>>>>>}
>>>>>>while (target<64)
>>>>>
>>>>>Less branches:
>>>>>
>>>>>do
>>>>>{
>>>>>...
>>>>>  tmp:=(info[target]!=EMPTY);  // maybe typecast to int ?
>>>>>  target+=(tmp*256)+8;
>>>>>} while (target<64);
>>>>>
>>>>>if (target>256) target=64;
>>>>>
>>>>>
>>>>>But I'm not sure if this works in C. (In pascal compares are garantied to return
>>>>>0 or 1 )
>>>>
>>>>It works in C(except the fact that I need to use a varaible for tmp)
>>>>
>>>>I even do not need the last if (target>256) target==64 because target is not
>>>>used after the loop but I am not sure if it makes me faster.
>>>>
>>>>I have many similiar loops and by comparing times by changing one loop
>>>>I find that it is even slightly slower.
>>>>
>>>>I used a global varaible for tmp because if I use a local caraible that I need
>>>>to define at the beginning of the function then it means that I need to change
>>>>all the similiar functions in order to practically use it.
>>>>
>>>>Uri
>>>
>>>Using a variable is not necessarily a bad thing. Many variables get optimized
>>>away. Even if you use the computation as an intermediate, there is no guarantee
>>>that the compiler won't store it to memory.
>>>
>>>One common technique for optimizing variable usage is to convert code into a
>>>so-called IL ("Intermediate Language"). The IL supports an basically infinite
>>>number of registers. Each calculation you perform goes into one of these
>>>registers, and it's never modified afterward. The compiler then performs
>>>liveness analysis on the IL to determine when it can throw away computations.
>>>Other things can affect optimization (such as whether or not you take the
>>>address of the variable), but in this case and many others, you are no worse off
>>>to use a variable over an intermediate computation.
>>>
>>>If your loops are all similar, it means you should probably put one copy in a
>>>function.
>> Global variables are discouraged because they cause thread re-entry
>>>issues. (i.e. you will have many problems if you decide to take advantage of
>>>parallel processing.)
>>>
>>>Again, the speed difference you're talking about is very minimal. Concentrate on
>>>a good design first and worry about speed later. You might address speed in your
>>>design, but optimization is an iterative, topological process. First you
>>>optimize your algorithms (i.e. bitboards are faster in some cases, arrays are
>>>faster in others, etc.) Next you can optimize memory access. If you want to
>>>optimize to the clock cycle, you can then work on individual loops like this.
>>>However, you want to use a profiler to do it -- there is no sense in optimizing
>>>an infrequently executed loop when there are a dozen others that could use
>>>optimizing. Speed often goes hand-in-hand with a good design.
>>>
>>>-Matt
>>
>>I know that I can get bigger speed improvements from changing the design and
>>improving the algorithm but this process is longer than changing few lines
>>and I like to see first what I can gain by changing few lines.
>>
>>I do not think that there is a contradiction between better design and
>>these speed improvements.
>>
>>I also found some small speed improvement that I can describe as improvement of
>>the algorithm
>>
>>My initial loop was the following loop(I include also the line before the loop.
>>
>>target+=8;
>>do
>>{
>>   directsee[target]&=~131074;
>>   if (info[target]==EMPTA)
>>     target+=8;
>>   else
>>     target=64;
>>}
>>while (target<64);
>>
>>better is
>>
>>do
>>{
>>  target+=8;
>>  directsee[target]&=~131074;
>>  if (info[target]!=EMPTA)
>>    break;
>>}
>>while (target<56)
>>
>>I can say that it is improvement of the algorithm because I save target+=8
>>that is not relevant.
>>
>>Note that the arrays that I used in this post are not going to be changed in the
>>better design.
>>
>>The name EMPTA and not EMPTY was used because I have different constants for
>>pieces and for information of the square.
>>
>>I have PAWN=0,KNIGHT=1,EMPTY=6
>>
>>I also have
>>
>>info[squarre]=
>>whitepawn=0,whiteknight=1,...blackpawn=8,...
>>EMPTA=22
>>
>>The idea is that I want to find by info[square] the piece and the color by
>>simple defines so I can do it by defines like defining piece(square)
>>to be info[square]&7
>>
>>
>>Uri
>
>Well, the loop you're looking at is very short, and there isn't really much you
>can do to optimize it unless you want to write it in assembly. You might save a
>memory reference here, avoid an addition there, etc. Unless that loop executes
>-often-, you're wasting your time.
>
>I think the sentinel idea is pretty good. I don't know if it will actually speed
>the loop up, but it's worth looking at.
>
>-Matt

You are right that the loop is not done very often.
I have many similiar loops but I have switch statement and practically they are
not done often.

The loop is part of my make move and unmake move and practically
my move generator and not my move maker takes more time.

Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.