Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: question about optimizing code

Author: Gerd Isenberg

Date: 12:32:12 07/07/03

Go up one level in this thread


On July 07, 2003 at 15:09:16, Andrew Dados wrote:

>On July 07, 2003 at 14:45:19, Gerd Isenberg wrote:
>
>>On July 07, 2003 at 12:26:17, Andrew Dados wrote:
>>
>>>On July 06, 2003 at 23:25:00, Uri Blass wrote:
>>>
>>>>In movei I have a number between 0 and 255 and I want to do some functions for
>>>>every 1 that it has in its binary representation(number of functions to
>>>>implement is the same as the number of 1's in the number and there is a
>>>>different function based on the value of the 1)
>>>>
>>>>I do not like to call functions because functions use local varaibles and
>>>>generating a copy of the local varaibles only to forget them may take computer
>>>>time and I also do not use the functions more than once.
>>>>
>>>>I thought to use switch(see the code later in this post) but the problem is that
>>>>switch continue to do everything without break and I want to have switch that
>>>>does not continue to do everything but does not quit and simply forget about the
>>>>fact that I called switch.
>>>>
>>>>Here is the type of the code that I plan to have now.
>>>>I do not like all the if (number&2) or if (number&4) and the question is if I
>>>>can do it faster.
>>>>
>>>>
>>>>i=smallestpower[number];
>>>>switch(i)
>>>>{
>>>>   case 0:do list of commands
>>>>          if (number==1) break
>>>>          else
>>>>          {
>>>>            number-=1;
>>>>            i=smallestpower[number];
>>>>          }
>>>>  case 1:if (number&2)
>>>>         {
>>>>           do list of commands
>>>>           if (number==2) break;
>>>>           else
>>>>           {
>>>>             number-=2;
>>>>             i=smallestpower[number];
>>>>           }
>>>>         }
>>>>  case 2:if (number&4)
>>>>         {
>>>>           do list of commands
>>>>           if (number==4) break;
>>>>           else
>>>>           {
>>>>             number-=4;
>>>>             i=smallestpower[number];
>>>>           }
>>>>         }
>>>>  case 3:if (number&8)
>>>>         ...
>>>>  ...
>>>>  case 6:if (number&64)
>>>>         ...
>>>>  case 7://if you got there you know that number is 128 because you did not get
>>>>out by break so you do not need to check it
>>>>
>>>>
>>>>Uri
>>>
>>>boolean checking is way faster then jumps; in your implementation you get about
>>>6 jumps in the code (case is a jump) since you have about 1.2 bit set on
>>>average. I'll try to go down to below 4 jumps average:
>>>
>>>if i & (128+64){
>>> if  i & 128 {}
>>> if  i & 64  {}
>>>if ! i | 63 goto done;
>>>}
>>>
>>>if i & (32+16){
>>> if  i & 32 {}
>>> if  i & 16  {}
>>>if ! i | 15 goto done;
>>>}
>>>
>>>if i & (8+4){
>>> if  i & 8 {}
>>> if  i & 4  {}
>>>}
>>>
>>>if i & (2+1){
>>> if  i & 2 {}
>>> if  i & 1  {}
>>>}
>>>
>>>done:
>>>
>>>--------------
>>>Since you have small number of average bits set this should be way faster then
>>>case switch
>>
>> - no, see below.
>>
>>>
>>>-Andrew-
>>
>>Hi Andrew,
>>
>>interesting asymmetric approach based on Uri's probability of the higher bits.
>>
>>Lets see, One bit set:
>>bit 7,6 - 4 compares
>>bit 5,4 - 1 + 4 = 5 compares
>>bit 3,2 - 1 + 1 + 3 + 1 = 6 compares
>>bit 1,0 - 1 + 1 + 1 + 3 = 6 compares
>>
>>Fine - and all seldom cases of multiple bits are handled too.
>>That's the real advantage of your code - it don't acts like a switch or a "if
>>else if" cascade with one mutual exclusive case. It behaves exactly like a
>>sequence of eight if stamements, considering perfectly Uri's bit derivation.
>>
>>Thought about an outer if i & (128+64+32+8) for a further binary divide but it
>>is worse here.
>>
>>The switch-statement per se should be superiour due to the consecutive cases
>>from 0 to 8 - with the help of Uri's firstOne byte lookup. So only one indirect
>>jump (eg. in assembler jmp [_jmptable + 4*eax]) - but all cases are exclusive
>>and one must handle multiple set bits in an outer while-loop or Reinhard's
>>individual goto_work do-while.
>>
>>So in best case Tim's while-switch approach requires 3 conditional or indirect
>>jumps with one bit set (two while and one indirect switch jump, not considered
>>range checking) but additional memory access and clearing the traversed bit.
>>Reinhard's goto_work do-while approach needs one compare less and profits from
>>set bit 7.
>>
>>So in average i guess Reinhard's  approach is fastest.
>>
>>But the fastest of course is the switch(256) approach, if code size doesn't
>>matter and the jumptable is in cache ;-)
>>
>>Regards,
>>Gerd
>
>2 remarks:
>Jump to memory reference takes 4 cycles,no branch prediction can be applied,
>short jump takes 1 cycle; while you are counting compares I was counting jumps
>(1-2 less then compare in each of your cases).
>
>I guess simply benchmarking that code against Reinhards should solve the issue.
>
>-Andrew-

That's it.

Aha, Compares and jumps, hmm... 50 percent propability of missprediction?
Yes you are right with the indirect jump, always mispredected.

Some further thoughts on the switch(256), which requires only one misspredicted
indirect jump:

Lets say one bit-handling requires ~32Bytes
(about 5-8 instructions or 2-3 C-Statements).
Eight bit set cases (1..8) makes 8*32 = 256Bytes = 1/4KByte
*128 for 256 cases  == 32KByte Code - peanuts ;-)

256 Byte per bit handling require 256KByte routine - hmm.
If the correct, mispredicted case code is not already in 1.Level cache?

Maybe your code is best.

Cheers,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.