Author: Robert Hyatt
Date: 19:52:06 01/05/03
Go up one level in this thread
On January 05, 2003 at 17:06:56, Matt Taylor wrote:
>>I was chugging away on the older binary search routine yesterday. I've modified
>>it to use a 5th register. This allows me to stagger the accumulation and make
>>use of extra cycles that were burning waiting on instructions to retire. It is
>>amazingly efficient -- considering the maximum of 3 instructions/cycle, there
>>are only 7 slots wasted across the entire routine (38 instructions). This means
>>optimum execution time for this sequence is 13 cycles, and I'm doing it in 15.
>
>Oops, did I say 15? I should correct myself. I am now doing it in 14. Only 4
>slots not utilized. I fixed a bug and gained performance. :-)
>
>-Matt
\
If you like cute code, here is Eugene's FirstOne() function for
Crafty. It is cute, but has one drawback in that it can not be inlined
due to the addressing computations he does which means that the value being
checked for FirstOne() has to be in memory...
cmpl $1, 8(%esp)
sbbl %eax, %eax
movl 8(%esp,%eax,4), %edx
bsr %edx, %ecx
jz l4
andl $32, %eax
subl $31, %ecx
subl %ecx, %eax
ret
l4: movl $64, %eax
ret
I think you will be amused if you work thru it. I should add that Crafty's
"bit numbering" has the MSB (leftmost) as bit 0, LSB (rightmost) as bit 63,
which is how the Cray does this due to the fact that they have a "leadz"
instruction that counts leading (leftmost) zeroes rather than finding the
first one bit. The Cray also returns 64 if there are no bits set. Eugene's
code emulates this perfectly.
I set about trying to turn this into something inlineable, and I also wanted
to do it with _zero_ jumps, just for fun. Here is what I came up with:
movl $-1, %ecx ;magic number, becomes 64 if no bits on
bsr 4(%esp), %eax ;check right end first
cmovz %ecx, %eax ;keep bit number or magic number
subl $32, %eax ;adjust to right hand range
bsr 8(%esp), %ecx ;check left end next
cmovz %eax, %ecx ;keep count if non-zero, otherwise use other
addl $32,%ecx ;adjust so subtract below works
movl $63, %eax ;reverse numbers to 0-64 (0-63 + nobit N)
subl %ecx, %eax ;by subtracting from 63
ret
If you walk thru that, you will see that there are three possible things
to do. If both words are zero, I need 64 as the answer. If the leftmost
32 bits are non-zero, I need the index to the first one bit there, otherwise
I need the index to the first non-zero bit n the right 32 bits.
My approach is to test the rightmost 32 bits first, and either get the bit
number that is set, or a special constant (more in a second). I then test
the leftmost 32 bits, and if one of those is set, I use that number and
adjust it to be 63-N to match my numbering.
The above code tests the right 32 bits and produces a constant that when
subtracted from 63 either returns 64 (the -1 at the top) or a number between
32 and 63. It then tests the left end and if nothing is set, it uses the
previous result and adjusts is. If one is set in the left end, I use that
value instead and still adjust it with 63-N.
Goes to show that cmov solves a lot of jmp-able issues with some thought.
Note that I have not tried to make this "fast". I just wanted to make it
work in a way that is pretty understandable... And it is inlineable because
it doesn't depend on the address subtraction trick Eugene does.
The main problem (as always) with X86 is that there are 4 GP registers, where
other processors have a bunch (31 on the SPARC, for instance).
I did the same for LastOne() as well, but am not posting it unless someone
wants to see it. Same basic idea, just scanned from the opposite end.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.