Author: Dieter Buerssner
Date: 13:19:07 02/03/04
Go up one level in this thread
On February 02, 2004 at 23:14:12, Omid David Tabibi wrote:
>Currently, my only critical assembly parts are the following:
[...]
>__forceinline UINT32 countBitsTrue(UINT32 data) {
> __asm {
> mov ecx, dword ptr data
> xor eax, eax
> test ecx, ecx
> jz l1
> l0: lea edx, [ecx-1]
> inc eax
> and ecx, edx
> jnz l0
> l1:
> }
>}
Why are you using inline assembly for this? Any decent C-compiler will come up
with similar code without inline assembly. For MSVC, the C code (when inlined)
will probably be faster than the inline assembly. It won't need the
mov ecx, dword ptr data
in many cases, because data may already be in a register. Register names will
not be hardcoded, so it gives the compiler more freedom to optimize. With Gcc
things are slightly differnt (you don't need to hardcode registers), but still I
cannot imagine any advantage of inline assembly in this case. And of course
porting to a 64-bit computer will be no issue, when using C.
For example:
typedef unsigned long UINT32;
int countBitsTrue(UINT32 data)
{
UINT32 w = data;
int ret=0;
if (w)
do
ret++;
while ((w &= w-1) != 0);
return ret;
}
cl -O2 -Fa popc32.c produces the following assembly:
TITLE popc32.c
.386P
include listing.inc
if @Version gt 510
.model FLAT
else
_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS ENDS
_TLS SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS ENDS
; COMDAT _countBitsTrue
_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT ENDS
FLAT GROUP _DATA, CONST, _BSS
ASSUME CS: FLAT, DS: FLAT, SS: FLAT
endif
PUBLIC _countBitsTrue
; COMDAT _countBitsTrue
_TEXT SEGMENT
_data$ = 8
_countBitsTrue PROC NEAR ; COMDAT
; File popc32.c
; Line 5
mov ecx, DWORD PTR _data$[esp-4]
; Line 6
xor eax, eax
; Line 7
test ecx, ecx
je SHORT $L96
$L94:
; Line 10
lea edx, DWORD PTR [ecx-1]
inc eax
and ecx, edx
jne SHORT $L94
$L96:
; Line 12
ret 0
_countBitsTrue ENDP
_TEXT ENDS
END
It is practically identical to your inline assembly!. It should never be slower!
Similar when using Gcc (produces more or less the same code). I think one should
stay away from inline assembly for this. I had some inline assembly in Yace for
a while, but got rid of it all. I would probably use it for FirstBit - but don't
need that really. For a 64 bit version of popcount on 32 bit hardware, that runs
(probably at least) as fast as the same algorithm with inline assembly see
http://f11.parsimony.net/forum16635/messages/31324.htm
Regards,
Dieter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.