Author: h.g.muller
Date: 13:52:27 02/07/06
Go up one level in this thread
OK, I already discovered my mistake, see other message. But about the Cray vs PC: sure I can't get that memory bandwidth out of my PC, but the point is that I don't NEED such memory bandwidth on a PC. The code on the Cray was vectorized by breaking it into loops that only did a single operation and you had to stream the results to and from memory between operations. The PC works best if you do as much as possible in a loop: combine all loops that run in the same direction through your data, so that, once data is brought in the cache, you chew it as much as you can before it is flushed. Most algorithms allow you to do some 10 operations on the data while it resides in L1, and that reduces the required bandwidth by a factor of 10 compared to what the Cray would need for the same computation. And when you need large strides you simply work on a number of colums simulaneously to make sure you process the entire cache line.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.