Author: Matt Taylor
Date: 03:31:27 12/05/02
Go up one level in this thread
On December 04, 2002 at 21:58:27, Jeremiah Penery wrote: >On December 04, 2002 at 21:13:40, Matt Taylor wrote: > >>On December 04, 2002 at 20:29:52, Bob Durrett wrote: >> >>> >>>The recent threads shed some light on the issue of when one is more important >>>than another, but the answer is sketchy and seems to be "depends." >>> >>>For current chess-playing programs, which is more important? Latency or >>>bandwidth? Why? >>> >>>Is the answer different if multiple processors are used? >>> >>>Bob D. >> >>The answer is always "depends." It depends on how you access memory, how much >>memory you access, and how often you access memory. >> >>I'm going to make the simplification here that the CPU accesses memory directly; >>some of the work done here is actually part of the chipset, but that's just a >>technical detail and doesn't change any of the conclusions. >> >>In order for an algorithm to be sensitive to bandwidth, it must be accessing >>memory (almost) serially. When the CPU issues a read/write request to main >>memory, it sends the address in two pieces: the row and the column. Sometimes >>the row and column bits are mangled for performance, but for simplicity let's >>assume that the row is the upper half of an address and column is the lower >>half. >> >>The CPU doesn't actually transmit both row -and- column every time it accesses >>memory. The memory module has a row register that remembers which row you >>accessed previously. This isn't just an optimization, either; it reduces power >>constraints and has some other interesting effects for EE people. Anyway, when >>the row changes, the module is forced to "close" the current row and "open" the >>other row. The open process takes some time as the cells in the row must be >>percharged. Avoiding a row change makes memory access faster. The column works >>in a similar fashion. The CL value for ram is the CAS (column address strobe) >>latency, the latency of changing the column address. >> >>Now, if you're accessing memory randomly or in some fashion that requires the >>row or column to change, you will often incur one (or both) CAS and RAS >>latencies. This would make your algorithm latency-dependent. >> >>When multiple processors are used, the answer is a little more obscure. Now that >>both processors are competing for the same memory, each has less bandwidth. Does >>the algorithm spend a -lot- of time in between each memory access? However, at >>the same time, the memory accesses between both processors usually changes the >>row and column. This means the latency is incurred on many cycles. >> >>Notably, though, not all SMP systems are shared-bus. The upcoming x86-64 Opteron >>chips from AMD includes a bus per CPU. > >Current AthlonMP chipsets also have a seperate bus per CPU. They use the same >EV6 bus as Alpha processors did (or still do?). The memory modules shared, >whereas Hammer will have separate memory modules for each processor. Yes and no. Each CPU has a dedicated bus to the memory controller (and to any other CPU??). However, there is only one memory bus (bus that physically connects to the memory chips), and that bus is shared by all processors. Intel has the same limitation, but Intel uses other tricks to further double the memory bandwidth (and effectively make memory access no more costly on SMP than on single-CPU). Unfortunately, AMD has not. (The AMD 760 and 760 MPX are the only Socket A SMP chipsets available, and neither has this capability.) The EV6 bus protocol is not the only thing that AMD licensed that came from the Alpha 21264. The designs are notably similar. I forget what all else they copied, but I remember that at least one other thing was actually licensed. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.