Author: Gerd Isenberg
Date: 12:33:35 03/31/04
Go up one level in this thread
>>>_chkstk() call is necessary if function allocates more than 4k on stack. >> >>I see - a page issue? > >The reason is the way Windows commits place on stack (allocates physical >memory). On program starup it reserves address space for your stack (1Mb by >default), but commits much less (again, all of this is default behavior -- you >can change it for your program). Next to the commited pages there is guard page. >If your program tries to access it there will be interrupt, Windows will commit >that page and mark the next one (with lower address) as new guard page. > >So your programs will have large stack by default, *and* system will allocate >only necessary amount of physical memory. If there are 20 processes running on >your box, such strategy will save almost 20Mb of RAM. > >As a result of such design program should not allocate more than 4k (on x86 and >AMD64) on stack without touching intermediate pages first. If you'll try to >access not yet commited stack location that is too far from current stack top >you'll get access violation. > >That is exactly what _chkstk() is doing -- it just "touches" intermediate pages >if your function wants to allocate more than one page on stack. > >Performance impact of _chkstk() calls is very small, because vast majority of >functions have less than 4k of local variables. And if function allocates more >than 4k, several instructions inside _chkstk() would not be noticeable. >[Actually we considered inlining _chkstk() when we are allocating only several >pages, but decided against it, because there would be no observable performance >gain on "normal" applications]. > I see - makes sense. >>> >>>memset() call is faster than REP STOSQ. Trust me. BTW, the old version of the >>>compiler would generate REP STOSQ. >> >>Yes, interesting. Curious about what is inside memset ;-) > >Nothing really interesting :-) Function just that looks at the alignment and >size of the block that you are filling, and uses different algorithms for large >aligned blocks, large unaligned blocks, medium-sized blocks, small blocks, etc. > Ok, i wondered why some aligned and unconditional REP STOSQ isn't faster, specially with small (eg. < 32 qwords) count, so that the call/ret overhead becomes relative more expensive. I remember the AMD64 optimization manual about that issue... >>> >>>And here is your assembly: >> >>Wow - absolutely convincing! >> >>Nice that all is inlined inside main, but the single functions are incarnated or >>listed separately. >> >>One minor point i don't understand inside the general purpose incarnation: >> >>updownAttacks<GPR>, COMDAT >>... >>; Line 222 >> ... >> mov QWORD PTR [rax-72], rbp >> ... >> >>; Line 224 >> movaps xmm0, XMMWORD PTR [rax-72] >> movdqa XMMWORD PTR [rax-72], xmm0 >> >>Some undocumented trick? > >No, just compiler stupidity :-) You are copying from "gu" to "gd": > > T gd(gu); > >Compiler was intelligent enough to allocate both variables in the same stack >location, but has not enough intelligence to get rid of the move (probably >because formally types are different -- I did not look at the details yet). We >cannot fix the issue prior to beta, but probably will fix it for the final >release. The final main inlining is rather free from such obstacles ;-) And xmm- and gp-instructions are interlaced from two inlined functions. That's really great! > >And there are some other places for which we can generate better code. You >probably did not noticed them, but I see inefficiences... > May be better instruction scheduling by using a few more registers? It should be possible with these two inlined kogge-stone functions to process four directions in parallel (two (three) xmm and two gpr). Even inside one direction, generator and propagator calculation may be interlaced. OTOH using xmm8-xmm15 implies an additional prefix-byte, but for queens... Cheers, Gerd >Thanks, >Eugene >
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.