Author: Robert Hyatt
Date: 15:20:14 03/18/03
Go up one level in this thread
On March 18, 2003 at 17:46:10, Tom Kerrigan wrote: >On March 18, 2003 at 16:37:35, Robert Hyatt wrote: > >>>>1. no interleaving, which means that the raw memory latency is stuck at >>>>120+ns and stays there. Faster bus means nothing without interleaving, >>>>if latency is the problem. >>> >>>Uh, wait a minute, didn't you just write a condescending post to me about how >>>increasing bandwidth improves latency? (Which I disagree with...) You can't have >>>it both ways. >>> >>>Faster bus speed improves both latency and bandwidth. How can it not? >> >>It doesn't affect random latency whatsoever. It does affect the time taken to >>load a >>cache line. Which does affect latency in a different way. However, >>interleaving does >>even better as even though it doesn't change latency either, it will load a >>cache line even >>faster. > >Are you kidding me? How can FSB speed _not_ affect latency? Very simple. Latency is caused _in_ the memory system, only a tiny part of latency is caused by the delay of shipping the data over the bus. If you ran the bus at 10ghz, you would _still_ see 120ns latency because you can't read DRAM any faster than that for the first read to a random location... >If you have 133MHz >CAS2 memory and a 100MHz FSB, it takes 2*(1/100M)=20ns + your northbridge >overhead to do a random access. Increase the bus speed to 133MHz, now the access >takes 2*(1/133M)=15ns + northbridge overhead. So it gets 5ns faster. I don't see >how it could possibly _not_ get faster. Where is the flaw in my logic? Run the test. This discussion was held on r.g.c.p a while back. And the _same_ results were found. Memory has 120ns latency no matter _what_ memory you use. RDRAM is even slower in terms of latency. If you can get your memory to sub-100ns latency, you've done a miracle in modern electronics. The problem is that the time required to ship the data across the bus is a _small_ fraction of the total transaction time. Driving it to zero still leaves that 120ns that is spread over everything but the bus. > >Second, are you sure an entire cache line has to be filled before any of the >data can be used? No, but if you fire of _another_ memory access, _that_ one can't be started until the previous cache line fill is completed. I thought I had said that. If it wasn't clear, that was my fault. However, on average, 1/2 of the cache line has to be filled before the data is available since on average 1/2 of the accesses will be in the front half, and 1/2 will be in the back half... >I thought memory timings like 2-1-1-1 meant that the word that >was requested took 2 cycles to access whereas the neighboring words then took 1 >cycle each to access during the burst transfer. If the requested word is being >sent first, you'd think the northbridge and processor would take advantage of >that fact. Do you have any information to the contrary? I mean, if that _wasn't_ >the case, don't you think memory timings would be given as the total of all the >numbers, e.g., 5 instead of 2-1-1-1? I mean, if the 2 isn't significant, why >advertise it? > So far as I know, current cache controllers start at the front of a line. There was talk a few years back about changing this and I haven't tried to follow it terribly closely, but the logic to start at the middle of a line and then wrap back around to the front was (at the time) deemed way too complex to deal with. I saw this when preliminary discussions about the original pentium-pro were being held. I don't recall how it all ended however. The point of modern burst-mode memory is that the first 8 bytes takes 120ns roughly to fetch, then the next N bytes come across at near bus speed. But that first delay is the killer I was talking about. The original timing number is the raw overall latency to start to read a particular memory column. That was where the derivatives such as fast page mode, EDO, SDRAM, RDRAM, and so forth started, by reading more data internally and after suffering thru the first huge latency delay, dumping the remainder of the requested data as quickly as possible. That is why RDRAM is so damned bad for chess also, as the initial latency is even higher although the streaming bandwidth is also much higher once you suffer thru the initial start-up delay. There are some good latency analysis programs running around on the net, for anyone wanting to see that their "60ns DRAM memory" is really 120ns when you factor in the entire pipeline from memory chip to the processor, thru two levels of cache, the bus, the bridge, etc... Running up the bus speed can help, but only on the small fraction of the time that is actually used in the bus transaction. And that time really is small... >-Tom
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.