Author: Walter Faxon
Date: 01:32:19 10/17/05
Go up one level in this thread
On October 16, 2005 at 03:58:56, Gerd Isenberg wrote:
>On October 16, 2005 at 01:33:08, Scott Gasch wrote:
>
>>Again, with FASTCALL on the function. This is MS C++ .net 2003 I think. I left
>>the comments in:
>>
>>; Listing generated by Microsoft (R) Optimizing Compiler Version 13.10.3077
>>
>>PUBLIC @getDayIndex1March00@12
>>; Function compile flags: /Ogty
>>_TEXT SEGMENT
>>_year$ = 8 ; size = 4
>>@getDayIndex1March00@12 PROC NEAR
>>; _day$ = ecx
>>; _month$ = edx
>>
>>; 637 : {
>>
>> push ebx
>> push esi
>>
>>; 638 : static int daysTilMonth[12] =
>>; 639 : {
>>; 640 : 6*31 + 4*30, // jan
>>; 641 : 7*31 + 4*30, // feb
>>; 642 : 0*31 + 0*30, // mar
>>; 643 : 1*31 + 0*30, // apr
>>; 644 : 1*31 + 1*30, // may
>>; 645 : 2*31 + 1*30, // jun
>>; 646 : 2*31 + 2*30, // jul
>>; 647 : 3*31 + 2*30, // aug
>>; 648 : 4*31 + 2*30, // sep
>>; 649 : 4*31 + 3*30, // oct
>>; 650 : 5*31 + 3*30, // nov
>>; 651 : 5*31 + 4*30, // dec
>>; 652 : };
>>; 653 : unsigned int cent, didx;
>>; 654 : year -= (month < 3);
>>
>> mov esi, DWORD PTR _year$[esp+4]
>> push edi
>> mov edi, edx
>
>Month is passed in edx, but edx is used later by the 32*32=64bit mul.
>So the register edi is used to keep that param as later array index.
>Unfortunately edi is none volatile and must be saved/restored on the stack !?
>
>> cmp edi, 3
>> sbb eax, eax
>> neg eax
>> sub esi, eax
>
>This is really funny and really a one to one coding of
>year -= (month < 3). At least it is branchless ;-)
>
>The first optimization idea is to replace neg, sub by add:
>
> cmp edi, 3
> sbb eax, eax
> add esi, eax
>
>The second optimization, as already mentioned, subtracting carry(borrow) direct
>from the year-register, like gcc does:
>
> cmp edi, 3
> sbb esi, 0
>
>>
>>; 655 : cent = year / 100;
>>
>> mov eax, 1374389535 ; 51eb851fH
>> mul esi
>>
>>; 656 : didx = year * 365 + (year>>2) - cent + (cent>>2)
>>; 657 : + daysTilMonth[month-1] + day;
>>
>> lea eax, DWORD PTR [esi+esi*8]
>> lea eax, DWORD PTR [esi+eax*8]
>> lea ebx, DWORD PTR [eax+eax*4]
>
>
>Guess the tree leas take the full 3*2 = 6 cycles.
>For amd64 i suggest 3 cycles and less code:
> imul ebx, esi, 365
>Is mul still slower on Intel-cpus (P4, Centrino)?
>
>
>>
>>; 658 : return didx;
>>
>> mov eax, DWORD PTR ?daysTilMonth@?1??getDayIndex1March00@@9@9[edi*4-4]
>> shr edx, 5
>> add eax, ebx
>> mov edi, edx
>> shr edi, 2
>> add eax, edi
>> shr esi, 2
>> add eax, esi
>> pop edi
>> sub eax, edx
>> pop esi
>> add eax, ecx
>
>
>Here finally the fastcall register ecx for "day" is considered.
>Hmm - probably it would be smarter to change year and day in the parameter list.
>
>> pop ebx
>>
>>; 659 : }
>>
>> ret 4
>>@getDayIndex1March00@12 ENDP
>>_TEXT ENDS
>>END
>
>Thanks,
>Gerd
Hi, Gerd.
Since this subject of low-level code optimization is so interesting to you, have
you thought of applying to join Eugene at the Intel compiler group (maybe after
some additional preliminary study)?
-- Walter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.