Author: Matt Taylor
Date: 12:57:13 12/10/02
Go up one level in this thread
On December 06, 2002 at 00:49:11, Walter Faxon wrote: >On December 02, 2002 at 20:27:39, Gerd Isenberg wrote: > >>Ahh.. i see, >> >>a special kind of popcount for consecutive trailing "ones". >>I love this interactive programming lessions. >> >>Gerd >> >>Btw.: Impressive assembler listing: >> >>00401003 8B 4C 24 0C mov ecx,dword ptr [esp+0Ch] >>00401007 8B 31 mov esi,dword ptr [ecx] >>00401009 8B 79 04 mov edi,dword ptr [ecx+4] >>0040100C 8B D6 mov edx,esi >>0040100E 8B C7 mov eax,edi >>00401010 83 C2 FF add edx,0FFFFFFFFh >>00401013 83 D0 FF adc eax,0FFFFFFFFh >>00401016 23 F2 and esi,edx >>00401018 23 F8 and edi,eax >>0040101A 89 31 mov dword ptr [ecx],esi >>0040101C 89 79 04 mov dword ptr [ecx+4],edi >>0040101F 8B CF mov ecx,edi >>00401021 33 D6 xor edx,esi >>00401023 33 C1 xor eax,ecx >>00401025 33 C2 xor eax,edx >>00401027 5F pop edi >>00401028 35 81 FC C5 01 xor eax,1C5FC81h >>0040102D 5E pop esi >>0040102E 8B D0 mov edx,eax >>00401030 C1 EA 10 shr edx,10h >>00401033 03 C2 add eax,edx >>00401035 33 D2 xor edx,edx >>00401037 8B C8 mov ecx,eax >>00401039 C1 E9 08 shr ecx,8 >>0040103C 2A C1 sub al,cl >>0040103E 25 FF 00 00 00 and eax,0FFh >>00401043 8A 90 D5 62 40 00 mov dl,byte ptr [eax+4062D5h] >>00401049 8B C2 mov eax,edx >>0040104B C3 ret > > >Hi, Gerd. > >I have noticed that most compilers aren't too smart where it comes to bitwise >dataflow analysis. After the "add eax, edx" instruction above, better (well, >shorter!) code would be: > > sub al, ah > and eax, 0FFh > mov al, byte ptr [eax+4062D5h] > ret > >Perhaps if we added a little "hint" in the last 2 lines of source: > >inline // inline declaration may differ by compiler >u8 LSB_64( u64* bb ) > { > u64 t64; > u32 t32; > u8 t8; > t64 = *bb - 1; > *bb &= t64; // omit this line to retain current LSB > t64 ^= *bb; > t32 = (u32)t64 ^ (u32)(t64 >> 32); > t32 ^= LSB_64_magic; > t32 += t32 >> 16; > t8 = (u8)t32 - (u8)(t32 >> 8); > return LSB_64_table [LSB_64_adj + t8]; > } > >Of course, this really only helps because we already know that the x86 class can >independently address both the first and 2nd bytes of a register. And it's >wishful thinking that much better asm code will actually be produced! I'd be >interested to see. > >If/when you run your test suite again, please try this version. Thanks. > >-- Walter I wanted to procrastinate studying for my finals, and I was writing this routine using MMX. MMX has an 8-byte word size, and some of these operations (like the xor) become atomic. The only problem is that I can't do 64-bit subtraction. Elsewhere the CPU makes emulation of large arithmetic quite easy by allowing one to add in (or subtract out) the carry bit, but MMX has no easy way of doing that. (As a side-note, the latest extensions to SSE include the ability to do atomic 64-bit arithmetic in MMX registers, too, but since I do not have a P4, I'm not interested in exploring that route at present.) However, it dawned on me that perhaps I don't -need- to generate a carry. I was looking at the first 3 lines: > t64 = *bb - 1; > *bb &= t64; // omit this line to retain current LSB > t64 ^= *bb; It seems to me that this would create a mask in t64 of 2^LSB - 1, right? -Matt
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.