Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: LSB and POPCOUNT optimization

Author: Anthony Cozzie

Date: 21:33:00 06/18/02

Go up one level in this thread


I just spent a few days playing around with LSB functions, and this is the best
I have been able to do.  This runs on approximately 31 clocks on my athlon,
which is 50% or more faster than any implementation I had based on the BSF
instruction.  The problem with BSF is it is one of the "CISC instructions of the
past" and actually involves testing each bit.  While this is done in microops
and not real loops, I think its still possible to better.

If anyone can optimize this code or suggest improvements, I'd really be
interested in seeing them.

//This is in ATT [gcc] syntax, destination is on the right, source on the left
movl 8(%esp), %eax
movl 12(%esp), %ebx
movl $32, %edx
xor %ecx, %ecx
testl %eax, %eax
cmove %ebx, %eax
cmove %edx, %ecx

movl %eax, %ebx
and $65535, %eax
movl %ecx, %edx
shr $16, %ebx
add $16, %edx
testl %eax, %eax
cmove %ebx, %eax
cmove %edx, %ecx

movl %eax, %ebx
and $255, %eax
movl %ecx, %edx
shr $8, %ebx
add $8, %edx
test %eax, %eax
cmove %ebx, %eax
cmove %edx, %ecx

movsbl lead_one_lut(%eax), %eax
add %ecx, %eax

Here lead_one_lut is a 256 byte table with the precomputed LSB for an 8 bit
integer.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.