Author: Tony Werten
Date: 00:24:02 12/05/02
Go up one level in this thread
On December 04, 2002 at 23:23:32, Robert Hyatt wrote: >On December 04, 2002 at 21:58:27, Jeremiah Penery wrote: > >>On December 04, 2002 at 21:13:40, Matt Taylor wrote: >> >>>On December 04, 2002 at 20:29:52, Bob Durrett wrote: >>> >>>> >>>>The recent threads shed some light on the issue of when one is more important >>>>than another, but the answer is sketchy and seems to be "depends." >>>> >>>>For current chess-playing programs, which is more important? Latency or >>>>bandwidth? Why? >>>> >>>>Is the answer different if multiple processors are used? >>>> >>>>Bob D. >>> >>>The answer is always "depends." It depends on how you access memory, how much >>>memory you access, and how often you access memory. >>> >>>I'm going to make the simplification here that the CPU accesses memory directly; >>>some of the work done here is actually part of the chipset, but that's just a >>>technical detail and doesn't change any of the conclusions. >>> >>>In order for an algorithm to be sensitive to bandwidth, it must be accessing >>>memory (almost) serially. When the CPU issues a read/write request to main >>>memory, it sends the address in two pieces: the row and the column. Sometimes >>>the row and column bits are mangled for performance, but for simplicity let's >>>assume that the row is the upper half of an address and column is the lower >>>half. >>> >>>The CPU doesn't actually transmit both row -and- column every time it accesses >>>memory. The memory module has a row register that remembers which row you >>>accessed previously. This isn't just an optimization, either; it reduces power >>>constraints and has some other interesting effects for EE people. Anyway, when >>>the row changes, the module is forced to "close" the current row and "open" the >>>other row. The open process takes some time as the cells in the row must be >>>percharged. Avoiding a row change makes memory access faster. The column works >>>in a similar fashion. The CL value for ram is the CAS (column address strobe) >>>latency, the latency of changing the column address. >>> >>>Now, if you're accessing memory randomly or in some fashion that requires the >>>row or column to change, you will often incur one (or both) CAS and RAS >>>latencies. This would make your algorithm latency-dependent. >>> >>>When multiple processors are used, the answer is a little more obscure. Now that >>>both processors are competing for the same memory, each has less bandwidth. Does >>>the algorithm spend a -lot- of time in between each memory access? However, at >>>the same time, the memory accesses between both processors usually changes the >>>row and column. This means the latency is incurred on many cycles. >>> >>>Notably, though, not all SMP systems are shared-bus. The upcoming x86-64 Opteron >>>chips from AMD includes a bus per CPU. >> >>Current AthlonMP chipsets also have a seperate bus per CPU. They use the same >>EV6 bus as Alpha processors did (or still do?). The memory modules shared, >>whereas Hammer will have separate memory modules for each processor. > > >The problem with that is it turns into a NUMA architecture which has its _own_ >set of problems. One cpu connected to one memory module means that the other >CPU can't get to it as efficiently... IIRC they created a new buzzword for that: Hyper Transport. Haven't seen any tests yet how well it really works, but it should improve the bandwith. Tony > >IE this doesn't offer one tiny bit of improvement over a SMP-type machine with >shared memory... Unless the algorithm is specifically designed to attempt to >lccalize memory references and duplicate data that is needed by both threads >often... > >This might be an improvement for running two programs at once. For one >program using two processors, NUMA offers additional challenges for the >parallel programmer...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.