Author: Robert Hyatt
Date: 14:56:52 07/17/03
Go up one level in this thread
On July 17, 2003 at 07:51:05, Vincent Diepeveen wrote: >seemingly hyatt has been asking around: I haven't been asking around at all. No idea what you are rambling about now, nor what you are "on" at the moment... > >http://www.talkchess.com/forums/1/message.html?306766 > >On July 17, 2003 at 00:26:21, Keith Evans wrote: > >>On July 16, 2003 at 22:40:10, Vincent Diepeveen wrote: >> >>>On July 16, 2003 at 13:04:40, Keith Evans wrote: >>> >>>>On July 16, 2003 at 07:20:50, Vincent Diepeveen wrote: >>>> >>>>>On July 16, 2003 at 00:44:34, Keith Evans wrote: >>>>> >>>>>>On July 16, 2003 at 00:29:43, Robert Hyatt wrote: >>>>>> >>>>>>>On July 16, 2003 at 00:05:29, Keith Evans wrote: >>>>>>> >>>>>>>>On July 15, 2003 at 23:35:30, Robert Hyatt wrote: >>>>>>>> >>>>>>>>>On July 15, 2003 at 23:05:37, Vincent Diepeveen wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>Now i can disproof again the 130ns figure that Bob keeps giving here for dual >>>>>>>>>>machines and something even faster than that for single cpu (up to 60ns or >>>>>>>>>>something). Then i'm sure he'll be modifying soon his statement something like >>>>>>>>>>to "that it is not interesting to know the time of a hashtable lookup, because >>>>>>>>>>that is not interesting to know; instead the only scientific intersting thing is >>>>>>>>>>to know is how much bandwidth a machine can actually achieve". >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>What is _interesting_ is the fact that you are incapable of even recalling >>>>>>>>>the numbers I posted. >>>>>>>>> >>>>>>>>>to wit: >>>>>>>>> >>>>>>>>>dual xeon 2.8ghz, 400mhz FSB. 149ns latency >>>>>>>>> >>>>>>>>>PIII/750 laptop, SDRAM. 125ns. >>>>>>>>> >>>>>>>>>Aaron posted the 60+ ns numbers for his overclocked athlon. I assume his >>>>>>>>>numbers are as accurate as mine since he _did_ run lm_bench, rather than >>>>>>>>>something with potential bugs. >>>>>>>>> >>>>>>>>>I can post bandwidth numbers if you want, but that has nothing to do with >>>>>>>>>latency, as those of us understanding architecture already know. >>>>>>>>> >>>>>>>> >>>>>>>>Can you run lmbench and give the latency numbers for different stride sizes? >>>>>>>>Then you could quote numbers from cache,... >>>>>>>> >>>>>>> >>>>>>>Here's my laptop data. L1 seems to be 4 clocks. L2 9 clocks, memory >>>>>>>at 130ns. This is a PIII/750mhs machine with SDRAM. I just ran it again >>>>>>>to produce these numbers. >>>>>>> >>>>>>> >>>>>>> >>>>>>>Host OS Mhz L1 $ L2 $ Main mem Guesses >>>>>>>--------- ------------- --- ---- ---- -------- ------- >>>>>>>scrappy Linux 2.4.20 744 4.0370 9.4300 130.2 >>>>>>> >>>>>>>>In the lmbench paper they have a nice graph like this. >>>>>>> >>>>>>> >>>>>>>Is the above what you want? >>>>>> >>>>>>I think that it's as close as you're going to get. The most important thing is >>>>>>that 130 [ns] is the largest number. And wouldn't that be a little bit >>>>>>pessimistic even for chess hash tables? >>>>> >>>>>this is optimistic, because those latency numbers are sequential latency >>>>>numbers. Already opened gates at the RAM you can read faster from than if you >>>>>must open a new one at a random spot. >>>>> >>>>>Trivially hashtables you have not opened it at that random spot yet. >>>>> >>>>>That is an additional latency extra that addes to this 130. Most likely that >>>>>will add up to like above 280 ns up to 400 ns for dual Xeons DDR ram 133Mhz. >>>>> >>>>>Best regards, >>>>>Vincent >>>> >>>>Let's take a simple example for starters: >>>> >>>>Say that you read from memory location 0x00000000, then 0x01000000, then >>>>0x02000000. >>>> >>>>Do you define this as sequential? What hardware mechanism makes the accesses at >>>>0x01000000 and 0x02000000 occur faster than the first access to location >>>>0x00000000? >>> >>>http://www.vml.co.uk/Data/ddr_256mbit.pdf >>> >>>It describes it a bit. In this case for DDR ram. >>> >>>See for example page 8 the one last line. >>> >>>"200 clock cycles are required between the DLL reset and any read command" >>> >>> >>>then in page 17 the explanation: >>> "the read command is used to initiate a burst read access to an active row. >>> ... if auto precharge is selected, the row being accessed will be precharged >>>at the end of the read burst; if auto precharge is not selected then the row >>>will remain opened for subsequent accesses" >>> >>> >>>and don't forget to checkout page 21. >>> >>>and so on. there is enough data there. >> >>Do you know what a DLL is? It's a delay locked loop - something similar but >>simpler than a PLL (phase locked loop.) These are often used in digital circuits >>for things like doubling a clock frequency, getting delays which are a fraction >>of clock long,... (Xilinx has some good material on this which you can check >>out.) >> >>Now the quote that you gave from page 8 is from the section "Initialization - >>DDR SDRAMs must be powered up and initialized in a predefined manner" I don't >>know why you think that this has anything to do with normal reads or writes. The >>200 ns that you refer to is typically a one time operation. >> >>I already know about the second item that you quoted. Noticed that my addresses >>were not in the same row. So this does not apply. >> >>You might look at the part that says: >>"3. BA0-BA1 provide bank address and A0-A12 provide row address. >> 4. BA0-BA1 provide bank address; A0-Ai provide column address (where i=8 for >>x16, 9 for x8 and 11 for x4 except A10); A10 HIGH enables the auto precharge >>feature (nonpersistent), A10 LOW disables the auto precharge feature" >> >>Just looking at that do you think that all of the addresses that I gave are in >>the same row? >> >>If not, then doesn't that imply that the row will have to be opened for each >>successive access? >> >>I did some DRAM controller design about 10 years ago, and the internals haven't >>really changed that much. I've never done any DDR design but from a quick look >>here's my SWAG at it: >> >>Let's assume that we need to do a ACTIVE then READ then PRECHARGE with CL=2 DDR >>RAM operating with a clock frequency of 133 MHz. I believe that this adds up to >>about 9 clocks which would be almost 70 ns. See tRCD (18 ns) + tRP (18 ns) plus >>the CL=2 read access. Then you have to add in the additional delays inside of >>the chipset and the processor. >> >>Please point out the missing ns in the above.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.