Author: Jeremiah Penery
Date: 21:12:46 09/02/03
Go up one level in this thread
On September 02, 2003 at 22:50:21, Robert Hyatt wrote: >I have no real opteron numbers yet. But let me make up a couple of numbers >just for discussion. > >local memory = 80ns (I hope, but don't expect to see that.) >memory one hop away = 110ns. and I assume a max of 1 hop for small >numbers of processors (2 for example). > >If I put a split block in local memory, it will run at a latency of 80ns. But >the current code puts them all in one memory, so one processor runs at 80ns >less collision loss, and the other runs at 110ns (again less collision loss). But if you move half the data to the other processor's local memory, the average access speed is the same. Each processor has half its accesses local and half not. The only way I can think to solve it is to copy the data into both processors' local memory. With small numbers of processors it may not be too difficult, but I think that would become increasingly difficult as the number of processors increased, not to mention it would consume more and more bandwidth. >That is a significant loss. Go to 4 processors, which I use all the time, and >it gets worse. No, it isn't a factor of 2. But as I said, I scrap for every >2% speedup I can find. for 20-30-40% I would jump through _many_ hoops... I don't totally disagree, but it's a *far* cry from the 20x speedup you were able to get from the Cray.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.