Author: Robert Hyatt
Date: 09:06:46 09/03/03
Go up one level in this thread
On September 03, 2003 at 00:12:46, Jeremiah Penery wrote: >On September 02, 2003 at 22:50:21, Robert Hyatt wrote: > >>I have no real opteron numbers yet. But let me make up a couple of numbers >>just for discussion. >> >>local memory = 80ns (I hope, but don't expect to see that.) >>memory one hop away = 110ns. and I assume a max of 1 hop for small >>numbers of processors (2 for example). >> >>If I put a split block in local memory, it will run at a latency of 80ns. But >>the current code puts them all in one memory, so one processor runs at 80ns >>less collision loss, and the other runs at 110ns (again less collision loss). > >But if you move half the data to the other processor's local memory, the average >access speed is the same. Each processor has half its accesses local and half >not. The only way I can think to solve it is to copy the data into both >processors' local memory. With small numbers of processors it may not be too >difficult, but I think that would become increasingly difficult as the number of >processors increased, not to mention it would consume more and more bandwidth. > I copy _anyway_. The key is to copy to the local memory of the processor that will actually use the data. >>That is a significant loss. Go to 4 processors, which I use all the time, and >>it gets worse. No, it isn't a factor of 2. But as I said, I scrap for every >>2% speedup I can find. for 20-30-40% I would jump through _many_ hoops... > >I don't totally disagree, but it's a *far* cry from the 20x speedup you were >able to get from the Cray.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.