Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: odd msc 2005 behaviour - 64-bit mode

Author: Gerd Isenberg

Date: 02:06:11 06/14/05

Go up one level in this thread


On June 13, 2005 at 23:38:27, Eugene Nalimov wrote:

>On June 13, 2005 at 14:14:09, Gerd Isenberg wrote:
>
>>On June 13, 2005 at 13:45:16, Eugene Nalimov wrote:
>>
>>>On June 13, 2005 at 13:23:53, Gerd Isenberg wrote:
>>>
>>>>hi, compiler experts!
>>>>
>>>>Inside a recursive search routine (not alfa/beta but my fruit fly ;-) with only
>>>>this-pointer and one additional integer parameter and local, msc2005 wastes 40
>>>>bytes (72 with other optimizations) stackspace each call. A new stack
>>>>defragmentation trick by ms? For 8-byte alignment those paddings seems a bit to
>>>>huge. Each call eats one cacheline.
>>>>Can someone please explain what's going on here ;-)
>>>
>>>Calling conventions. You should reserve (I believe) 32 bytes on stack for
>>>function you are calling. Extra 8 bytes are because stack should be 16-bytes
>>>align, but on function entry it is 8 bytes aligned, and we are saving even
>>>number of registers.
>>
>>I see - usually we have some more variables on the stack - so the waste becomes
>>relative smaller if not zero.
>>
>>Otoh there are 3 register parameters as well as a lot of remaining registers.
>>A recursive, very compact qsearch ...
>
>You compiled your function optimized for size (/O1), and because of that
>compiler decided to use very short PUSH/POP instructions to save/restore
>registers, even though it results in some unused slots on stack. If you compile
>your program optimizing for speed (/O2 or /Ox), compiler will use MOV
>instructions, and it will save registers into empty stack slots provided by
>caller:
>
>; Listing generated by Microsoft (R) Optimizing Compiler Version 14.00.50317
>...
>
>?searchDeBruijn@DeBruijnGenerator@@QEAAXI@Z PROC	;
>DeBruijnGenerator::searchDeBruijn
>	sub	rsp, 40					; 00000028H
>	mov	QWORD PTR [rsp+48], rbx
>	mov	rbx, rcx
>; Line 62
>	mov	ecx, DWORD PTR [rcx+32]
>	cmp	ecx, 1
>	mov	QWORD PTR [rsp+72], rdi
>	jbe	$LN5@searchDeBr
>	mov	QWORD PTR [rsp+56], rbp
>	mov	QWORD PTR [rsp+64], rsi
>	...
>
>Is that what you want?

Not exactly ;-)

It seems to "save" 32 byte (for the four push/pop) per call.
But 40 + 8 byte (return address) is still a bit more than the "optimal"
16 byte (8 byte to push "i", 8 byte return address) - but of course that is
related to the register usage across (recursive) functions calls - you already
mentioned is difficult, specially with indirect, virtual calls, even if const.

>
>(You can also see that compiler "shrink wrapped" save/restore for some
>registers, i.e. RBP and RSI would be saved/restored only if they are used in the
>function).
>
>>Well may be an iterative approach for alfa/beta pays even more off.
>>
>>>
>>>>I also wonder whether it is not possible for the compiler to keep the class
>>>>members inside registers during the recursive search - dumb compiler ;-)
>>>
>>>We were thinking about such optimization, but had to prune it due to some more
>>>urgent needs. In any case you have indirect call in your function, so the
>>>optimiziation would not fire even were it implemented.
>>
>>The virtual const might be a hint.
>
>We are not using types for memory disambiguation (alias analysis). That is
>conscious decision. We know that by using type information we can improve
>quality of generated code. Unfortunately, by doing so we will also break lot of
>existing code. Yes, that code is not standard compliant, but it always compiled
>and worked, it can be 20 years old, such bugs are very hard to trace, and we
>don't want our customers complain "typical MS product -- buggy compiler broke my
>code".

The "bane" of old, existing code.
New motivation for assembly programmers.
Thanks for providing some insights,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.