Author: Robert Hyatt
Date: 07:57:36 06/07/01
Go up one level in this thread
On June 07, 2001 at 05:10:26, martin fierz wrote: >hi, > >this is from an earlier post of bob hyatt: > >>That is a basic optimization strategy... variables used close together in >>time should be close together in memory to take advantage of 32-byte line >>fills in cache. > >is this somthing which everybody is doing? is it worth the trouble to move >around variable declarations? if this is optimized, how much performance >gain can you expect compared to random variable placing? It is definitely worth doing. The problem is that the ANSI standard says that just because two variables are declared in adjacent declarations, they are not guaranteed to be adjacent in memory unless they are both in the same structure. Any good book on optimizing teaches this as an early principle. For math problems, there are the so-called "blocked" algorithms that work with their data in smaller chunks so it will all fit into cache, etc. Performance is harder to predict. It has the potential to eliminate 6 unnecessary memory reads out of 8, if you use 32 bit values. Because cache fills an entire 32 byte line in one burst. >are there any other strategies to optimize a program for good cache performance? >and how do you measure this, if ordering the variables in one function causes >that function to be 1% faster & the overall program 0.01% or something? just >so i can try it in one function for starters... 1% is simply 1%. Pick up a few of those and it becomes 10%. Etc. > >here's another observation i made: i have a P4 [i know... no need to tell me >that it's a bad choice :-)] desktop and a P3 laptop. i tried bobs recommendation >to use char and short arrays for small variables (like my >"lastbit" array), and on the P3 this was indeed faster - on the P4 it was >slower though. is this to be expected? I haven't studied the p4 pipe at all. It is longer. Why it would be worse for bytes over words is hard to say. Accessing bytes has always been harder than accessing words, but the memory bandwidth of accessing words has made using bytes pay off. Perhaps the RDRAM in the P4 is the issue here. > >cheers > martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.