Author: Matt Taylor
Date: 12:58:37 03/19/03
Go up one level in this thread
On March 19, 2003 at 13:52:31, Robert Hyatt wrote: >On March 19, 2003 at 12:53:42, Matt Taylor wrote: > >>On March 18, 2003 at 23:09:08, Robert Hyatt wrote: >> >>>On March 18, 2003 at 19:45:43, Tom Kerrigan wrote: >>> >>>>On March 18, 2003 at 18:20:14, Robert Hyatt wrote: >>>> >>>>>On March 18, 2003 at 17:46:10, Tom Kerrigan wrote: >>>>> >>>>>>On March 18, 2003 at 16:37:35, Robert Hyatt wrote: >>>>>> >>>>>>>>>1. no interleaving, which means that the raw memory latency is stuck at >>>>>>>>>120+ns and stays there. Faster bus means nothing without interleaving, >>>>>>>>>if latency is the problem. >>>>>>>> >>>>>>>>Uh, wait a minute, didn't you just write a condescending post to me about how >>>>>>>>increasing bandwidth improves latency? (Which I disagree with...) You can't have >>>>>>>>it both ways. >>>>>>>> >>>>>>>>Faster bus speed improves both latency and bandwidth. How can it not? >>>>>>> >>>>>>>It doesn't affect random latency whatsoever. It does affect the time taken to >>>>>>>load a >>>>>>>cache line. Which does affect latency in a different way. However, >>>>>>>interleaving does >>>>>>>even better as even though it doesn't change latency either, it will load a >>>>>>>cache line even >>>>>>>faster. >>>>>> >>>>>>Are you kidding me? How can FSB speed _not_ affect latency? >>>>> >>>>>Very simple. Latency is caused _in_ the memory system, only a tiny part of >>>>>latency >>>>>is caused by the delay of shipping the data over the bus. If you ran the bus >>>>... >>>>>Run the test. This discussion was held on r.g.c.p a while back. And the _same_ >>>>>results were found. Memory has 120ns latency no matter _what_ memory you >>>>>use. RDRAM is even slower in terms of latency. If you can get your memory to >>>>>sub-100ns latency, you've done a miracle in modern electronics. >>>> >>>>I guess I'm sitting in front of one miraculous computer, then, because it can >>>>randomly access a word in 75ns. Just ran the test. (RDRAM, BTW.) >>> >>>Yes you are. You have the fastest single CPU on the planet. Notice that to >>>do this test, you have to access a byte, skip down 128 bytes and access another >>>and repeat this for a _long_ set of addresses. If you _still_ get 75ns >>>you _do_ have the fastest PC latency ever reported by any serious tester. >> >>AMD thinks so too. The most accurate figure I've found is about 70 ns for the >>on-die memory controller that Clawhammer has. (I saw some claims of sub-40 ns, >>but I find that hard to believe.) >> >>>>If you have a 133MHz DIMM that's rated at 2-1-1-1, it can obviously access a >>>>word in 15ns. >>> >>>I don't believe 15ns for a second. Just look at current specs for DRAM and >>>tell me how that is going to happen? Again, look at any memory benchmarking >>>done on the internet by folks that do this for a living. _nobody_ has reported >>>sub 100ns latency for any test I have seen, when talking about the PC. Or >>>when talking about a sixty million dollar Cray. >> >>15 ns is believable. You must remember that ram is configured as rows and >>columns. The full 100-120 ns is the latency of opening a new row and reading. >>You and Tom seem to be talking about different things here. A completely random >>access is going to hit RAS and stall the full 100-120 ns. Reloading the column >>will only hit CAS and stall for 15 ns. > >The only memory latency that is interesting is "random access latency". >Anything else >plays right into things like RDRAM and makes it look great, when its random >access >latency is bad. The DDR memory in my dual xeon is 150ns which looks poor >compared >to the 130ns SDRAM in my much cheaper PIII laptop. Your "cheaper PIII laptop" doesn't need all the extra baggage that comes with dual processors. >>>> If the system gets that word in 75ns (ignoring RDRAM vs. DIMM >>>>latency for now) that means 20% of the latency is from the memory and 80% (not >>>>"a tiny part") is from "shipping the data over the bus" (and through the >>>>northbridge). Conventional wisdom says there's a 10ns wire/pin delay for a >>>>signal going into or out of a chip, so into northbridge + out of northbridge + >>>>into processor = 30ns. That means 30ns of processing is done on the northbridge >>>>and processor. That's why everybody is so worked up about Hammer's on-die memory >>>>controller--it reduces memory latency by, well, somewhere between 20 and 50ns, >>>>or roughly 50%. >>>> >>>>End of today's lecture... >>> >>>Now to get some _real_ data before giving the _next_ lecture. As I said, >>>access 1M bytes, with a 128 byte stride so cache-line pre-fetching won't >>>artificially bias the result downward. >>> >>>I'll try to run this on a group of dual xeons here tomorrow, starting with my >>>2.8's and also trying the 3.06's. >>> >>>Several of us did this on R.G.C.P a few months back however, and 120+ ns >>>was the _best_ time reported when the test was run correctly. >> >>I got 133 ns as well. Aaron was running tests like crazy this morning on his >>nForce 2, and he reported times as low as 70 ns. I find that -very- impressive. >>Of course, that was with massive memory overclocking. >> >>-Matt > >Still it is the fastest time I have _ever_ seen for memory latency. The X86 >pipeline >to memory is _so_ long. Going thru the mmu, to L1, to L2 to the bus, to the >memory >controller, to the chip, pulling that off in 70ns reliably is _remarkable_. >Particularly >when you consider that the latency for 60 million dollar computers like the Cray >T90 >is over 100ns. The infamous Cray-2, the first machine with 32 gigabytes of RAM >had >a similar latency. 2.1ns clock, 50 clock cycle latency. I believe the MMU comes after L1 & L2 to reduce latency of cache accesses. I thought it was awkward at first (reloading cr3 therefore invalidates the entire cache), but it makes sense. >So 70 seems not only good, but _incredibly_ good. lmbench is pretty good at >measuring >this accurately, so long as you are sure you tell it to use way more RAM than >will fit into >cache. > >I broke it on my quad 700 due to the 1MB L2, but when I had it use 128mb for the >memory >tests, things dropped back to 130ns for that box as well, but my quad 700 used >SDRAM which >seems to be the best there is right now. Agreed, 70 ns is superb. AMD talks about Clawhammer's 80 ns latency being a big deal. I can't get a solid figure (only guesses), but the 70 ns seems accurate. When Opteron comes out, I plan to get one, and I'll see about testing memory latency at that point. -Matt
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.