Author: Gerd Isenberg
Date: 02:06:11 06/14/05
Go up one level in this thread
On June 13, 2005 at 23:38:27, Eugene Nalimov wrote: >On June 13, 2005 at 14:14:09, Gerd Isenberg wrote: > >>On June 13, 2005 at 13:45:16, Eugene Nalimov wrote: >> >>>On June 13, 2005 at 13:23:53, Gerd Isenberg wrote: >>> >>>>hi, compiler experts! >>>> >>>>Inside a recursive search routine (not alfa/beta but my fruit fly ;-) with only >>>>this-pointer and one additional integer parameter and local, msc2005 wastes 40 >>>>bytes (72 with other optimizations) stackspace each call. A new stack >>>>defragmentation trick by ms? For 8-byte alignment those paddings seems a bit to >>>>huge. Each call eats one cacheline. >>>>Can someone please explain what's going on here ;-) >>> >>>Calling conventions. You should reserve (I believe) 32 bytes on stack for >>>function you are calling. Extra 8 bytes are because stack should be 16-bytes >>>align, but on function entry it is 8 bytes aligned, and we are saving even >>>number of registers. >> >>I see - usually we have some more variables on the stack - so the waste becomes >>relative smaller if not zero. >> >>Otoh there are 3 register parameters as well as a lot of remaining registers. >>A recursive, very compact qsearch ... > >You compiled your function optimized for size (/O1), and because of that >compiler decided to use very short PUSH/POP instructions to save/restore >registers, even though it results in some unused slots on stack. If you compile >your program optimizing for speed (/O2 or /Ox), compiler will use MOV >instructions, and it will save registers into empty stack slots provided by >caller: > >; Listing generated by Microsoft (R) Optimizing Compiler Version 14.00.50317 >... > >?searchDeBruijn@DeBruijnGenerator@@QEAAXI@Z PROC ; >DeBruijnGenerator::searchDeBruijn > sub rsp, 40 ; 00000028H > mov QWORD PTR [rsp+48], rbx > mov rbx, rcx >; Line 62 > mov ecx, DWORD PTR [rcx+32] > cmp ecx, 1 > mov QWORD PTR [rsp+72], rdi > jbe $LN5@searchDeBr > mov QWORD PTR [rsp+56], rbp > mov QWORD PTR [rsp+64], rsi > ... > >Is that what you want? Not exactly ;-) It seems to "save" 32 byte (for the four push/pop) per call. But 40 + 8 byte (return address) is still a bit more than the "optimal" 16 byte (8 byte to push "i", 8 byte return address) - but of course that is related to the register usage across (recursive) functions calls - you already mentioned is difficult, specially with indirect, virtual calls, even if const. > >(You can also see that compiler "shrink wrapped" save/restore for some >registers, i.e. RBP and RSI would be saved/restored only if they are used in the >function). > >>Well may be an iterative approach for alfa/beta pays even more off. >> >>> >>>>I also wonder whether it is not possible for the compiler to keep the class >>>>members inside registers during the recursive search - dumb compiler ;-) >>> >>>We were thinking about such optimization, but had to prune it due to some more >>>urgent needs. In any case you have indirect call in your function, so the >>>optimiziation would not fire even were it implemented. >> >>The virtual const might be a hint. > >We are not using types for memory disambiguation (alias analysis). That is >conscious decision. We know that by using type information we can improve >quality of generated code. Unfortunately, by doing so we will also break lot of >existing code. Yes, that code is not standard compliant, but it always compiled >and worked, it can be 20 years old, such bugs are very hard to trace, and we >don't want our customers complain "typical MS product -- buggy compiler broke my >code". The "bane" of old, existing code. New motivation for assembly programmers. Thanks for providing some insights, Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.