Author: Matt Taylor
Date: 09:26:03 01/07/03
Go up one level in this thread
On January 07, 2003 at 04:31:55, Uri Blass wrote:
>On January 06, 2003 at 22:11:11, Matt Taylor wrote:
>
>>On January 06, 2003 at 19:49:10, Uri Blass wrote:
>>
>>>On January 06, 2003 at 18:55:22, Heiner Marxen wrote:
>>>
>>>>On January 03, 2003 at 20:26:44, Matt Taylor wrote:
>>>>
>>>>>On January 03, 2003 at 14:27:02, Ricardo Gibert wrote:
>>>>>
>>>>>>On January 03, 2003 at 14:09:56, Uri Blass wrote:
>>>>>>
>>>>>>>On January 03, 2003 at 14:04:23, Ricardo Gibert wrote:
>>>>>>>
>>>>>>>>On January 03, 2003 at 12:03:45, Uri Blass wrote:
>>>>>>>>
>>>>>>>>>code B is slightly faster than code A.
>>>>>>>>>I know that side can get only 0 or 1(something that the compiler does not know)
>>>>>>>>>and B is eqvivalent to A if you assume that side gets only 0 or 1.
>>>>>>>>>
>>>>>>>>>Is it possible to write a third code that will be even faster than B?
>>>>>>>>>
>>>>>>>>>I think that if the compiler can know that side is or 0 or 1 it can do B even
>>>>>>>>>faster.
>>>>>>>>>
>>>>>>>>>code A:
>>>>>>>>>
>>>>>>>>>if (side==LIGHT)
>>>>>>>>>{
>>>>>>>>> if (to>=56)
>>>>>>>>> {
>>>>>>>>> gen_promote(from,to,bits);
>>>>>>>>> return;
>>>>>>>>> }
>>>>>>>>>}
>>>>>>>>>else
>>>>>>>>>{
>>>>>>>>> if (to<=7)
>>>>>>>>> {
>>>>>>>>> gen_promote(from,to,bits);
>>>>>>>>> return;
>>>>>>>>> }
>>>>>>>>>}
>>>>>>>>
>>>>>>>>
>>>>>>>>Try something like this:
>>>>>>>>
>>>>>>>>if ((to<=7) || (to>=56))
>>>>>>
>>>>>>
>>>>>>You could also use "if (to-8 >= 48)" instead of the above. This assumes "to" is
>>>>>>unsigned. It would be a shame for you to duplicate code just avoid 1 extra
>>>>>>operation (the subtraction).
>>>>><snip>
>>>>>
>>>>>That is awesome. I did not even think of that. That is probably the best code
>>>>>possible to put there.
>>>>>
>>>>>-Matt
>>>>
>>>>That is a trick to do a range check, which I first saw in some code generated
>>>>from a C compiler for a switch statement (loong ago).
>>>>
>>>>From the sources of Chest:
>>>>
>>>>/*
>>>> * in_range() is general purpose range checking macro for integral types.
>>>> * Note, that the lower bound is inclusive and the upper bound exclusive !
>>>> * Note, that only the lower bound is expanded more than once.
>>>> */
>>>>#define in_range(x, lo, hi) \
>>>> ( (((unsigned long)(x )) - ((unsigned long)(lo))) \
>>>> < (((unsigned long)(hi)) - ((unsigned long)(lo))) \
>>>> )
>>>>
>>>>Cheers,
>>>>Heiner
>>>
>>>The question is if the compiler does not know to do that trick if
>>>I tell it
>>>if ((x>=lo)&&(x<hi))
>>>
>>>The point is that I am not sure if it is a good idea to spend energy about
>>>tricks when tricks may in some cases even prevent the compiler to find even
>>>better tricks.
>>>The only trick that may be productive is trick that is based on knowledge that
>>>the compiler cannot know but I think that even in this case a smart compiler
>>>should ask the user productive question.
>>>
>>>Here is an example
>>>
>>>I had in my program the following code for finding the direction that the king
>>>is attacked.
>>>
>>>temp=directsee[kingsquare[side]];
>>>if (side==0)
>>> temp=(temp>>16)&65535;
>>> else
>>> temp=temp&65535;
>>>
>>>
>>>I know that side cannot get values except 0 or 1.
>>>
>>>A smart compiler can run the program find that side always get only 0 or 1 and
>>>ask me if side can get value that is not 0 or 1.
>>>
>>>Unfortunately I guess that compilers are not smart enough so they do not do it.
>>>
>>>The result is that I use the following trick
>>>
>>>#define dsee(target,side) ((directsee[(target)]>>((side)<<4))&65535)
>>>temp=dsee([kingsquare[side]],1^side)
>>>
>>>People told me that I can use BOOL instead of int in C++ but I use only C files
>>>and I do not like to change my files to .cpp because I read that C++ is not
>>>exactly more than C and there are cases when the same code in C++ can give
>>>different results than the same code in C.
>>>
>>>Uri
>>
>>Changing the extension to .cpp should have no impact on the performance of the
>>code. Theoretically the C++ compiler is just as good at optimizing C code as the
>>C compiler is. Because of the way compilers are built, this should hold true.
>>
>>No compiler I know of can afford to assume that side is either 0 or 1. Profiled
>>or not, it means you would have to trigger -every- possible value to ensure
>>correct code generation. It is theoretically possible to determine possible
>>values for side, but I don't know of any compiler that currently does this.
>>Ironic that this point comes up because I was discussing today the possibility
>>of building a tool to do just that. :-)
>>
>>FYI, with full optimization VC generated two comparisons for your test.
>>
>>Also FYI, I can see much potential if you allow side to be 0 or -1. Assuming 0
>>or -1 (instead of 1), your other code becomes:
>>
>>temp = directsee[kingsquare[side]];
>>temp = (((temp >> 16) & ~side) | (temp & side)) & 65535;
>>
>>Not as readable, but much faster.
>>
>>Also, with regard to pure C optimization, I discovered a day or two ago that the
>>AMD manual addresses optimization of -pure- C code. The following links to a
>>post I made about it:
>>http://www.talkchess.com/forums/1/message.html?275188
>>
>>-Matt
>
>Thanks
>
>I added this site to my favourites and I may look at it later.
>
>The question is if I cannot use define that will give the same result
>when I put directsee[target] instead of your temp in the defintion.
>
>If I can do it then I do not see why the code is less readable.
>The only possible problem is if the compiler does not do the same optimization
>for your 2 commands and for the following single C command:
>
>temp = (((directsee[kingsquare[side]] >> 16) & ~side) |
>(directsee[kingsquare[side]] & side)) & 65535;
>
>Uri
Usually the best optimizations that can be made involve branch elimination. It
is acceptable to introduce a fair number of cycles in the assembly code to avoid
a mispredicted branch because they are -expensive-. The value of this C code is
that it avoids the mispredicted branch, and it is not much more expensive. I was
overly optimistic about branch elimination optimizations that compilers make;
apparently they're not too good at tricks like this.
The compiler should also generate moderately efficient code here. It will work
on both parts at the same time. The shift is problematic, but it's not -too-
bad. To make it more efficient, the only things I can think of are either (1)
flipping the order of the data in directsee or (2) eliminating the second table
translation. Neither is going to gain you much, but it all depends on how fast
this code has to be.
I did forget, you'll need to negate side when indexing kingsquare. The other
option is to flip the order and add 1 to side (e.g. when side=0, use
kingsquare[1], and when side=-1, use kingsquare[0]). The latter is more
efficient for the processor, and the compiler will generate efficient code.
-Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.