Author: Robert Hyatt
Date: 16:45:43 08/27/03
Go up one level in this thread
On August 27, 2003 at 18:31:30, Jeremiah Penery wrote: >On August 27, 2003 at 17:17:39, Robert Hyatt wrote: > >>On August 26, 2003 at 18:38:34, Jeremiah Penery wrote: >> >>>On August 26, 2003 at 11:42:03, Robert Hyatt wrote: >>> >>>>On August 26, 2003 at 03:00:52, Johan de Koning wrote: >>>> >>>>>On August 25, 2003 at 18:04:29, Robert Hyatt wrote: >>>>> >>>>>>A single cpu that will run crafty at 1M nps has a cache-cache and cache-memory >>>>>>bandwidth of X bytes/second. A single cpu that runs crafty at 2M nps has >>>>>>exactly twice the cache-cache and cache-memory bandwidth and twice the clock >>>>>>frequency. A dual-cpu just needs two cpus, but the two cpus give twice the >>>>>>cache-cache bandwidth, but _no_ improvement in cache-memory bandwidth. >>>>> >>>>>Nope, C2M bandwith is constant, regardless of n and f (hence constant :-). >>>> >>>>Actually it isn't. Some duals use interleaving. Some don't. All quads I >>>>have here use 4-way interleaving to ramp up the bandwidth significantly. >>>> >>>>All machines are _not_ created equal... >>> >>>A great many single-CPU motherboards you can buy offer dual-channel memory >>>and/or 4-way memory interleaving. >> >>1. I don't know much about dual-channel, but it doesn't sound related to >>interleaving. > >It doubles available memory bandwidth. Not by trickery, but by actually having >two full-width memory channels. Much like interleaving, it requires paired >DIMMs to accomplish. > >>2. I have looked at most every single-cpu machine we have here and they >>are _all_ plain memory boxes (non-interleaved). The easiest way to catch >>interleaving is to find a requirement that you add two (or four) DIMMS, one >>per "bank". If you have just one DIMM, or you have just one/two/three DIMM >>slots, that MB isn't interleaving. > >I didn't say Dell sold single-cpu machines that had interleaving. But every >motherboard that any of my friends or I have bought in the last few years offers >4-way memory interleaving. Yeah, you can't turn it on unless you have 2+ DIMMs >installed, but it's still there. > >>> I don't know if many of the SMP (2-CPU, >>>specifically) boards for Intel or Athlon have that stuff or not. Opteron >>>doesn't worry about it, as each CPU has its own dedicated (dual-channel) memory, >>>and N CPUs have N times the aggregate bandwidth of a single-CPU machine. >> >> >>Yes, but Opteron is NUMA, which has its _own_ problem issues to deal with. > >I don't see why NUMA should pose any problems. It's all handled by the hardware >and the operating system anyway. Not really. If I put something in my local memory I can get to it much faster than if it is in the local memory of _another_ processor. Lets take a "split block" in crafty, which contains _all_ the search-critical data. If it is not in my local memory for the processor using that split block to do the search, performance dies miserably. Since my split blocks are (at present) just an array of big structures, they are in contiguous memory and they will exist in the local memory of only one processor. All others will run dog slow, except for the one with the quick access. So yes, the hardware and O/S make it work, but it is up to the programmer to make it work _efficiently_. In my case, I need to distribute "split blocks" across processors, so that each processor has a few in its local memory. Then when I need to give a processor something to do, I take the performance hit (a short one) to copy from my local split block to his remote split block, but then he runs like blazes with his local copy. Right now I don't have the first hit, but the second is a killer since only one processor has any local split blocks. That takes a design change to correct. One that is not needed on a non-NUMA type architecture. There are other issues that also cause problems, such as sharing data that causes lots of cache transactions to keep things coherent. > >>Anytime you say "each processor has its own memory" you are talking NUMA. > >Yes, I know that. > >>I haven't seen any recent Intel MBs that didn't have interleaving. If you >>don't do that, you take a big performance hit as memory has a hard enough >>time keeping up with one cpu, much less two, without some sort of bandwidth >>increasing trickery. >>All of my quads have 4-way interleaving without exception, going back to my >>original quad pentium-pro 200 box I still have (ALR Revolution). > >And all of my friends' and my single-cpu boxes have 4-way interleaving also. It seems relatively pointless on a single-cpu machine, since cache is already loaded in "burst mode". And that's the point of SDRAM/DDRAM/etc, to provide the next N blocks much more quickly than the first block. On a dual or quad, it makes a great deal more sense...
This page took 0.05 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.