Author: Dieter Buerssner
Date: 16:18:46 02/17/04
Go up one level in this thread
On February 17, 2004 at 16:10:10, Dann Corbit wrote: How did you create the source? Just the ouutput (more or less) of the pre processor run? It looks like it, but you still have the #include(s), which would not be in that output. Anyway, your assembly seems to strengthen my point. It does not look worse, than the "hand tuned" assembly code from (former versions of) Crafty. >_bitfiddling PROC NEAR ; COMDAT > >; 27 : unsigned long u, >; 28 : v; >; 29 : v = bb.w[1]; >; 30 : u = bb.w[0]; >; 31 : u = (u&0x55555555) + ((u>>1)&0x55555555); > > mov eax, DWORD PTR _bb$[esp-4] > mov ecx, eax > and eax, 1431655765 ; 55555555H > shr ecx, 1 > and ecx, 1431655765 ; 55555555H > add ecx, eax > >; 32 : v = (v&0x55555555) + ((v>>1)&0x55555555); > > mov eax, DWORD PTR _bb$[esp] > mov edx, eax > and eax, 1431655765 ; 55555555H > shr edx, 1 > and edx, 1431655765 ; 55555555H > add edx, eax > >; 33 : u = (u & 0x33333333) + ((u >> 2) & 0x33333333) >; 34 : + (v & 0x33333333) + ((v >> 2) & 0x33333333); > > mov eax, ecx > shr eax, 2 > and eax, 858993459 ; 33333333H > push esi > mov esi, edx > shr esi, 2 > and ecx, 858993459 ; 33333333H > and esi, 858993459 ; 33333333H > add eax, esi > add eax, ecx > and edx, 858993459 ; 33333333H > add eax, edx > >; 35 : u = (u&0x0f0f0f0f) + ((u>>4)&0x0f0f0f0f); > > mov ecx, eax > and eax, 252645135 ; 0f0f0f0fH > shr ecx, 4 > and ecx, 252645135 ; 0f0f0f0fH > add ecx, eax > >; 36 : u += u >> 8; > > mov eax, ecx > shr eax, 8 > add ecx, eax > >; 37 : u += u >> 16; > > mov eax, ecx > shr eax, 16 ; 00000010H > add eax, ecx > >; 38 : return u & 0xff; > > and eax, 255 ; 000000ffH > pop esi > >; 39 : } > > ret 0 Less instructions here, than in x86.s of older crafties, that used the same idea (and this code will probably run faster). So - why use assembly for it? >_bitfiddling ENDP >; Function compile flags: /Ogty >_TEXT ENDS >; COMDAT _dieter_popc >_TEXT SEGMENT >_a$ = 8 ; size = 8 >_dieter_popc PROC NEAR ; COMDAT > >; 10 : unsigned long w; >; 11 : int n = 0; >; 12 : w = a.w[0]; > > mov ecx, DWORD PTR _a$[esp-4] > xor eax, eax > >; 13 : if (w) > > test ecx, ecx > je SHORT $L552 > npad 6 >$L550: > >; 14 : do >; 15 : n++; >; 16 : while ((w &= w - 1) != 0); > > lea edx, DWORD PTR [ecx-1] > inc eax > and ecx, edx > jne SHORT $L550 >$L552: > >; 17 : w = a.w[1]; > > mov ecx, DWORD PTR _a$[esp] > >; 18 : if (w) > > test ecx, ecx > je SHORT $L556 >$L554: > >; 19 : do >; 20 : n++; >; 21 : while ((w &= w - 1) != 0); > > lea edx, DWORD PTR [ecx-1] > inc eax > and ecx, edx > jne SHORT $L554 >$L556: > >; 22 : return n; >; 23 : } > > ret 0 >_dieter_popc ENDP Almost exactly the same, as the inline assembly you posted some messages before. So, why use assembly for this? I mentioned already, that in many situations, the C code will probably be faster. >;===================================== >; And with SSE2 enabled, we get this: >;===================================== [snipped] I only looked fast over this. I don't see a difference to the above (neither would I expect any difference). Did you see a difference? Regards, Dieter
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.