Author: Frank Phillips
Date: 03:33:00 12/07/02
Go up one level in this thread
On December 06, 2002 at 00:49:11, Walter Faxon wrote: >On December 02, 2002 at 20:27:39, Gerd Isenberg wrote: > >>Ahh.. i see, >> >>a special kind of popcount for consecutive trailing "ones". >>I love this interactive programming lessions. >> >>Gerd >> >>Btw.: Impressive assembler listing: >> >>00401003 8B 4C 24 0C mov ecx,dword ptr [esp+0Ch] >>00401007 8B 31 mov esi,dword ptr [ecx] >>00401009 8B 79 04 mov edi,dword ptr [ecx+4] >>0040100C 8B D6 mov edx,esi >>0040100E 8B C7 mov eax,edi >>00401010 83 C2 FF add edx,0FFFFFFFFh >>00401013 83 D0 FF adc eax,0FFFFFFFFh >>00401016 23 F2 and esi,edx >>00401018 23 F8 and edi,eax >>0040101A 89 31 mov dword ptr [ecx],esi >>0040101C 89 79 04 mov dword ptr [ecx+4],edi >>0040101F 8B CF mov ecx,edi >>00401021 33 D6 xor edx,esi >>00401023 33 C1 xor eax,ecx >>00401025 33 C2 xor eax,edx >>00401027 5F pop edi >>00401028 35 81 FC C5 01 xor eax,1C5FC81h >>0040102D 5E pop esi >>0040102E 8B D0 mov edx,eax >>00401030 C1 EA 10 shr edx,10h >>00401033 03 C2 add eax,edx >>00401035 33 D2 xor edx,edx >>00401037 8B C8 mov ecx,eax >>00401039 C1 E9 08 shr ecx,8 >>0040103C 2A C1 sub al,cl >>0040103E 25 FF 00 00 00 and eax,0FFh >>00401043 8A 90 D5 62 40 00 mov dl,byte ptr [eax+4062D5h] >>00401049 8B C2 mov eax,edx >>0040104B C3 ret > > >Hi, Gerd. > >I have noticed that most compilers aren't too smart where it comes to bitwise >dataflow analysis. After the "add eax, edx" instruction above, better (well, >shorter!) code would be: > > sub al, ah > and eax, 0FFh > mov al, byte ptr [eax+4062D5h] > ret > >Perhaps if we added a little "hint" in the last 2 lines of source: > >inline // inline declaration may differ by compiler >u8 LSB_64( u64* bb ) > { > u64 t64; > u32 t32; > u8 t8; > t64 = *bb - 1; > *bb &= t64; // omit this line to retain current LSB > t64 ^= *bb; > t32 = (u32)t64 ^ (u32)(t64 >> 32); > t32 ^= LSB_64_magic; > t32 += t32 >> 16; > t8 = (u8)t32 - (u8)(t32 >> 8); > return LSB_64_table [LSB_64_adj + t8]; > } > >Of course, this really only helps because we already know that the x86 class can >independently address both the first and 2nd bytes of a register. And it's >wishful thinking that much better asm code will actually be produced! I'd be >interested to see. > >If/when you run your test suite again, please try this version. Thanks. > >-- Walter Has anybody posted Walter's method with the table for 0 == highest order bit? Frank
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.