Author: Dann Corbit
Date: 16:32:09 02/17/04
Go up one level in this thread
On February 17, 2004 at 19:18:46, Dieter Buerssner wrote: >On February 17, 2004 at 16:10:10, Dann Corbit wrote: > >How did you create the source? Just the ouutput (more or less) of the pre >processor run? It looks like it, but you still have the #include(s), which would >not be in that output. > >Anyway, your assembly seems to strengthen my point. It does not look worse, than >the "hand tuned" assembly code from (former versions of) Crafty. > >>_bitfiddling PROC NEAR ; COMDAT >> >>; 27 : unsigned long u, >>; 28 : v; >>; 29 : v = bb.w[1]; >>; 30 : u = bb.w[0]; >>; 31 : u = (u&0x55555555) + ((u>>1)&0x55555555); >> >> mov eax, DWORD PTR _bb$[esp-4] >> mov ecx, eax >> and eax, 1431655765 ; 55555555H >> shr ecx, 1 >> and ecx, 1431655765 ; 55555555H >> add ecx, eax >> >>; 32 : v = (v&0x55555555) + ((v>>1)&0x55555555); >> >> mov eax, DWORD PTR _bb$[esp] >> mov edx, eax >> and eax, 1431655765 ; 55555555H >> shr edx, 1 >> and edx, 1431655765 ; 55555555H >> add edx, eax >> >>; 33 : u = (u & 0x33333333) + ((u >> 2) & 0x33333333) >>; 34 : + (v & 0x33333333) + ((v >> 2) & 0x33333333); >> >> mov eax, ecx >> shr eax, 2 >> and eax, 858993459 ; 33333333H >> push esi >> mov esi, edx >> shr esi, 2 >> and ecx, 858993459 ; 33333333H >> and esi, 858993459 ; 33333333H >> add eax, esi >> add eax, ecx >> and edx, 858993459 ; 33333333H >> add eax, edx >> >>; 35 : u = (u&0x0f0f0f0f) + ((u>>4)&0x0f0f0f0f); >> >> mov ecx, eax >> and eax, 252645135 ; 0f0f0f0fH >> shr ecx, 4 >> and ecx, 252645135 ; 0f0f0f0fH >> add ecx, eax >> >>; 36 : u += u >> 8; >> >> mov eax, ecx >> shr eax, 8 >> add ecx, eax >> >>; 37 : u += u >> 16; >> >> mov eax, ecx >> shr eax, 16 ; 00000010H >> add eax, ecx >> >>; 38 : return u & 0xff; >> >> and eax, 255 ; 000000ffH >> pop esi >> >>; 39 : } >> >> ret 0 > >Less instructions here, than in x86.s of older crafties, that used the same idea > (and this code will probably run faster). So - why use assembly for it? I agree. A waste of time. Less portable. Probably slower to use hand-tuned assembly. Also, sometimes inlined assembly will defeat the compiler optimizations. >>_bitfiddling ENDP >>; Function compile flags: /Ogty >>_TEXT ENDS >>; COMDAT _dieter_popc >>_TEXT SEGMENT >>_a$ = 8 ; size = 8 >>_dieter_popc PROC NEAR ; COMDAT >> >>; 10 : unsigned long w; >>; 11 : int n = 0; >>; 12 : w = a.w[0]; >> >> mov ecx, DWORD PTR _a$[esp-4] >> xor eax, eax >> >>; 13 : if (w) >> >> test ecx, ecx >> je SHORT $L552 >> npad 6 >>$L550: >> >>; 14 : do >>; 15 : n++; >>; 16 : while ((w &= w - 1) != 0); >> >> lea edx, DWORD PTR [ecx-1] >> inc eax >> and ecx, edx >> jne SHORT $L550 >>$L552: >> >>; 17 : w = a.w[1]; >> >> mov ecx, DWORD PTR _a$[esp] >> >>; 18 : if (w) >> >> test ecx, ecx >> je SHORT $L556 >>$L554: >> >>; 19 : do >>; 20 : n++; >>; 21 : while ((w &= w - 1) != 0); >> >> lea edx, DWORD PTR [ecx-1] >> inc eax >> and ecx, edx >> jne SHORT $L554 >>$L556: >> >>; 22 : return n; >>; 23 : } >> >> ret 0 >>_dieter_popc ENDP > >Almost exactly the same, as the inline assembly you posted some messages before. >So, why use assembly for this? I mentioned already, that in many situations, the >C code will probably be faster. > >>;===================================== >>; And with SSE2 enabled, we get this: >>;===================================== >[snipped] > >I only looked fast over this. I don't see a difference to the above (neither >would I expect any difference). Did you see a difference? I did not notice one. Did not even look. Just generated it: Monkey sees option. Monkey clicks option. Monkey posts to CCC. I agree with all of your assertions.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.