Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Possible small improvement to hacky method

Author: Gerd Isenberg

Date: 16:14:18 12/10/02

Go up one level in this thread


On December 10, 2002 at 15:57:13, Matt Taylor wrote:

>On December 06, 2002 at 00:49:11, Walter Faxon wrote:
>
>>On December 02, 2002 at 20:27:39, Gerd Isenberg wrote:
>>
>>>Ahh.. i see,
>>>
>>>a special kind of popcount for consecutive trailing "ones".
>>>I love this interactive programming lessions.
>>>
>>>Gerd
>>>
>>>Btw.: Impressive assembler listing:
>>>
>>>00401003 8B 4C 24 0C          mov         ecx,dword ptr [esp+0Ch]
>>>00401007 8B 31                mov         esi,dword ptr [ecx]
>>>00401009 8B 79 04             mov         edi,dword ptr [ecx+4]
>>>0040100C 8B D6                mov         edx,esi
>>>0040100E 8B C7                mov         eax,edi
>>>00401010 83 C2 FF             add         edx,0FFFFFFFFh
>>>00401013 83 D0 FF             adc         eax,0FFFFFFFFh
>>>00401016 23 F2                and         esi,edx
>>>00401018 23 F8                and         edi,eax
>>>0040101A 89 31                mov         dword ptr [ecx],esi
>>>0040101C 89 79 04             mov         dword ptr [ecx+4],edi
>>>0040101F 8B CF                mov         ecx,edi
>>>00401021 33 D6                xor         edx,esi
>>>00401023 33 C1                xor         eax,ecx
>>>00401025 33 C2                xor         eax,edx
>>>00401027 5F                   pop         edi
>>>00401028 35 81 FC C5 01       xor         eax,1C5FC81h
>>>0040102D 5E                   pop         esi
>>>0040102E 8B D0                mov         edx,eax
>>>00401030 C1 EA 10             shr         edx,10h
>>>00401033 03 C2                add         eax,edx
>>>00401035 33 D2                xor         edx,edx
>>>00401037 8B C8                mov         ecx,eax
>>>00401039 C1 E9 08             shr         ecx,8
>>>0040103C 2A C1                sub         al,cl
>>>0040103E 25 FF 00 00 00       and         eax,0FFh
>>>00401043 8A 90 D5 62 40 00    mov         dl,byte ptr [eax+4062D5h]
>>>00401049 8B C2                mov         eax,edx
>>>0040104B C3                   ret
>>
>>
>>Hi, Gerd.
>>
>>I have noticed that most compilers aren't too smart where it comes to bitwise
>>dataflow analysis.  After the "add  eax, edx" instruction above, better (well,
>>shorter!) code would be:
>>
>>    sub    al, ah
>>    and    eax, 0FFh
>>    mov    al, byte ptr [eax+4062D5h]
>>    ret
>>
>>Perhaps if we added a little "hint" in the last 2 lines of source:
>>
>>inline                  // inline declaration may differ by compiler
>>u8 LSB_64( u64* bb )
>>    {
>>    u64 t64;
>>    u32 t32;
>>    u8 t8;
>>    t64 = *bb - 1;
>>    *bb &= t64;         // omit this line to retain current LSB
>>    t64 ^= *bb;
>>    t32 = (u32)t64 ^ (u32)(t64 >> 32);
>>    t32 ^= LSB_64_magic;
>>    t32 += t32 >> 16;
>>    t8 = (u8)t32 - (u8)(t32 >> 8);
>>    return LSB_64_table [LSB_64_adj + t8];
>>    }
>>
>>Of course, this really only helps because we already know that the x86 class can
>>independently address both the first and 2nd bytes of a register.  And it's
>>wishful thinking that much better asm code will actually be produced!  I'd be
>>interested to see.
>>
>>If/when you run your test suite again, please try this version.  Thanks.
>>
>>-- Walter
>
>I wanted to procrastinate studying for my finals, and I was writing this routine
>using MMX. MMX has an 8-byte word size, and some of these operations (like the
>xor) become atomic. The only problem is that I can't do 64-bit subtraction.
>Elsewhere the CPU makes emulation of large arithmetic quite easy by allowing one
>to add in (or subtract out) the carry bit, but MMX has no easy way of doing
>that.
>
>(As a side-note, the latest extensions to SSE include the ability to do atomic
>64-bit arithmetic in MMX registers, too, but since I do not have a P4, I'm not
>interested in exploring that route at present.)
>
>However, it dawned on me that perhaps I don't -need- to generate a carry. I was
>looking at the first 3 lines:
>
>>    t64 = *bb - 1;
>>    *bb &= t64;         // omit this line to retain current LSB
>>    t64 ^= *bb;
>
>It seems to me that this would create a mask in t64 of 2^LSB - 1, right?
>
>-Matt

Hi Matt,

my fastest to far bb-1:

        // bb in mm0
	pxor	  mm2, mm2  ; 0
	pcmpeqd	  mm3, mm3  ; -1
	pcmpeqd	  mm2, mm0  ; -hih*(bb == 0) : -low*(bb == 0)
	PUNPCKHDQ mm3, mm2  ; -low*(bb == 0) : -1
        paddd	  mm0, mm3  ; +(-1)

Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.