Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Possible small improvement to hacky method

Author: Matt Taylor

Date: 18:27:15 12/10/02

Go up one level in this thread


On December 10, 2002 at 19:14:18, Gerd Isenberg wrote:

>On December 10, 2002 at 15:57:13, Matt Taylor wrote:
>
>>On December 06, 2002 at 00:49:11, Walter Faxon wrote:
>>
>>>On December 02, 2002 at 20:27:39, Gerd Isenberg wrote:
>>>
>>>>Ahh.. i see,
>>>>
>>>>a special kind of popcount for consecutive trailing "ones".
>>>>I love this interactive programming lessions.
>>>>
>>>>Gerd
>>>>
>>>>Btw.: Impressive assembler listing:
>>>>
>>>>00401003 8B 4C 24 0C          mov         ecx,dword ptr [esp+0Ch]
>>>>00401007 8B 31                mov         esi,dword ptr [ecx]
>>>>00401009 8B 79 04             mov         edi,dword ptr [ecx+4]
>>>>0040100C 8B D6                mov         edx,esi
>>>>0040100E 8B C7                mov         eax,edi
>>>>00401010 83 C2 FF             add         edx,0FFFFFFFFh
>>>>00401013 83 D0 FF             adc         eax,0FFFFFFFFh
>>>>00401016 23 F2                and         esi,edx
>>>>00401018 23 F8                and         edi,eax
>>>>0040101A 89 31                mov         dword ptr [ecx],esi
>>>>0040101C 89 79 04             mov         dword ptr [ecx+4],edi
>>>>0040101F 8B CF                mov         ecx,edi
>>>>00401021 33 D6                xor         edx,esi
>>>>00401023 33 C1                xor         eax,ecx
>>>>00401025 33 C2                xor         eax,edx
>>>>00401027 5F                   pop         edi
>>>>00401028 35 81 FC C5 01       xor         eax,1C5FC81h
>>>>0040102D 5E                   pop         esi
>>>>0040102E 8B D0                mov         edx,eax
>>>>00401030 C1 EA 10             shr         edx,10h
>>>>00401033 03 C2                add         eax,edx
>>>>00401035 33 D2                xor         edx,edx
>>>>00401037 8B C8                mov         ecx,eax
>>>>00401039 C1 E9 08             shr         ecx,8
>>>>0040103C 2A C1                sub         al,cl
>>>>0040103E 25 FF 00 00 00       and         eax,0FFh
>>>>00401043 8A 90 D5 62 40 00    mov         dl,byte ptr [eax+4062D5h]
>>>>00401049 8B C2                mov         eax,edx
>>>>0040104B C3                   ret
>>>
>>>
>>>Hi, Gerd.
>>>
>>>I have noticed that most compilers aren't too smart where it comes to bitwise
>>>dataflow analysis.  After the "add  eax, edx" instruction above, better (well,
>>>shorter!) code would be:
>>>
>>>    sub    al, ah
>>>    and    eax, 0FFh
>>>    mov    al, byte ptr [eax+4062D5h]
>>>    ret
>>>
>>>Perhaps if we added a little "hint" in the last 2 lines of source:
>>>
>>>inline                  // inline declaration may differ by compiler
>>>u8 LSB_64( u64* bb )
>>>    {
>>>    u64 t64;
>>>    u32 t32;
>>>    u8 t8;
>>>    t64 = *bb - 1;
>>>    *bb &= t64;         // omit this line to retain current LSB
>>>    t64 ^= *bb;
>>>    t32 = (u32)t64 ^ (u32)(t64 >> 32);
>>>    t32 ^= LSB_64_magic;
>>>    t32 += t32 >> 16;
>>>    t8 = (u8)t32 - (u8)(t32 >> 8);
>>>    return LSB_64_table [LSB_64_adj + t8];
>>>    }
>>>
>>>Of course, this really only helps because we already know that the x86 class can
>>>independently address both the first and 2nd bytes of a register.  And it's
>>>wishful thinking that much better asm code will actually be produced!  I'd be
>>>interested to see.
>>>
>>>If/when you run your test suite again, please try this version.  Thanks.
>>>
>>>-- Walter
>>
>>I wanted to procrastinate studying for my finals, and I was writing this routine
>>using MMX. MMX has an 8-byte word size, and some of these operations (like the
>>xor) become atomic. The only problem is that I can't do 64-bit subtraction.
>>Elsewhere the CPU makes emulation of large arithmetic quite easy by allowing one
>>to add in (or subtract out) the carry bit, but MMX has no easy way of doing
>>that.
>>
>>(As a side-note, the latest extensions to SSE include the ability to do atomic
>>64-bit arithmetic in MMX registers, too, but since I do not have a P4, I'm not
>>interested in exploring that route at present.)
>>
>>However, it dawned on me that perhaps I don't -need- to generate a carry. I was
>>looking at the first 3 lines:
>>
>>>    t64 = *bb - 1;
>>>    *bb &= t64;         // omit this line to retain current LSB
>>>    t64 ^= *bb;
>>
>>It seems to me that this would create a mask in t64 of 2^LSB - 1, right?
>>
>>-Matt
>
>Hi Matt,
>
>my fastest to far bb-1:
>
>        // bb in mm0
>	pxor	  mm2, mm2  ; 0
>	pcmpeqd	  mm3, mm3  ; -1
>	pcmpeqd	  mm2, mm0  ; -hih*(bb == 0) : -low*(bb == 0)
>	PUNPCKHDQ mm3, mm2  ; -low*(bb == 0) : -1
>        paddd	  mm0, mm3  ; +(-1)
>
>Gerd

I should note that the punpckhdq should be punpckldq or else you get the wrong
answer.

It didn't help my MMX routine much. Part of is it the necessary movd at the end
of the routine. The other performance hinderance here is lots of dependencies.
Sigh...

Thanks anyway.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.