Author: Gerd Isenberg
Date: 06:40:45 08/23/03
Go up one level in this thread
On August 23, 2003 at 04:21:28, Johan de Koning wrote: >On August 23, 2003 at 03:45:09, Johan de Koning wrote: > >> ... 1 extra line in main() can >>easily change the runtime by 1 or 2% (for reasons I haven't fathomed yet). > >I mean: I do understand it depends on code alignment. >I can imagine the instruction pipeline feeds at very high speed from an "open" >cache line. I can also imagine it is rather complicated to have more than 1 >cache line "open". But I can't imagine why I get random results. > >/**/ for( i = 0; i < top; i++ ) sum += i; >;;;; more: add, inc, cmp, jl more > >This loop usually executes in 2 cycles. But depending on the alignment I get >somtimes 2.667 or 4 or 4.5 cycles. Isn't that weird?! > >... Johan Hi Johan, Ok, your loop body is about 10 bytes. If i look to AMD Athlon Processor x86 Code Optimization Guide TM Page it becomes clearer. I guess P4 is similar. Regards, Gerd Page 49 4 Instruction Decoding Optimizations ... Overview -------------------------------------------------------------- The AMD Athlon processor instruction fetcher reads 16-byte aligned code windows from the instruction cache. The instruction bytes are then merged into a 24-byte instruction queue. On each cycle, the in-order front-end engine selects for decode up to three x86 instructions from the instruction-byte queue. .... and Page 54 Align Branch Targets in Program Hot Spots In program hot spots (as determined by either profiling or loop nesting analysis), place branch targets at or near the beginning of 16-byte aligned code windows. This guideline improves performance inside hotspots by maximizing the number of instructions fills into the instruction-byte queue and preserves I-cache space in branch intensive code outside such hotspots.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.