Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Precharging at DDR ram

Author: Robert Hyatt
Date: 14:56:52 07/17/03
On July 17, 2003 at 07:51:05, Vincent Diepeveen wrote:

>seemingly hyatt has been asking around:

I haven't been asking around at all.  No idea what you are rambling
about now, nor what you are "on" at the moment...


>
>http://www.talkchess.com/forums/1/message.html?306766
>
>On July 17, 2003 at 00:26:21, Keith Evans wrote:
>
>>On July 16, 2003 at 22:40:10, Vincent Diepeveen wrote:
>>
>>>On July 16, 2003 at 13:04:40, Keith Evans wrote:
>>>
>>>>On July 16, 2003 at 07:20:50, Vincent Diepeveen wrote:
>>>>
>>>>>On July 16, 2003 at 00:44:34, Keith Evans wrote:
>>>>>
>>>>>>On July 16, 2003 at 00:29:43, Robert Hyatt wrote:
>>>>>>
>>>>>>>On July 16, 2003 at 00:05:29, Keith Evans wrote:
>>>>>>>
>>>>>>>>On July 15, 2003 at 23:35:30, Robert Hyatt wrote:
>>>>>>>>
>>>>>>>>>On July 15, 2003 at 23:05:37, Vincent Diepeveen wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Now i can disproof again the 130ns figure that Bob keeps giving here for dual
>>>>>>>>>>machines and something even faster than that for single cpu (up to 60ns or
>>>>>>>>>>something). Then i'm sure he'll be modifying soon his statement something like
>>>>>>>>>>to "that it is not interesting to know the time of a hashtable lookup, because
>>>>>>>>>>that is not interesting to know; instead the only scientific intersting thing is
>>>>>>>>>>to know is how much bandwidth a machine can actually achieve".
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>What is _interesting_ is the fact that you are incapable of even recalling
>>>>>>>>>the numbers I posted.
>>>>>>>>>
>>>>>>>>>to wit:
>>>>>>>>>
>>>>>>>>>dual xeon 2.8ghz, 400mhz FSB.  149ns latency
>>>>>>>>>
>>>>>>>>>PIII/750 laptop, SDRAM.  125ns.
>>>>>>>>>
>>>>>>>>>Aaron posted the 60+ ns numbers for his overclocked athlon.  I assume his
>>>>>>>>>numbers are as accurate as mine since he _did_ run lm_bench, rather than
>>>>>>>>>something with potential bugs.
>>>>>>>>>
>>>>>>>>>I can post bandwidth numbers if you want, but that has nothing to do with
>>>>>>>>>latency, as those of us understanding architecture already know.
>>>>>>>>>
>>>>>>>>
>>>>>>>>Can you run lmbench and give the latency numbers for different stride sizes?
>>>>>>>>Then you could quote numbers from cache,...
>>>>>>>>
>>>>>>>
>>>>>>>Here's my laptop data.  L1 seems to be 4 clocks.  L2 9 clocks, memory
>>>>>>>at 130ns.  This is a PIII/750mhs machine with SDRAM.  I just ran it again
>>>>>>>to produce these numbers.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>Host                 OS   Mhz   L1 $   L2 $    Main mem    Guesses
>>>>>>>--------- -------------   ---   ----   ----    --------    -------
>>>>>>>scrappy    Linux 2.4.20   744 4.0370 9.4300       130.2
>>>>>>>
>>>>>>>>In the lmbench paper they have a nice graph like this.
>>>>>>>
>>>>>>>
>>>>>>>Is the above what you want?
>>>>>>
>>>>>>I think that it's as close as you're going to get. The most important thing is
>>>>>>that 130 [ns] is the largest number. And wouldn't that be a little bit
>>>>>>pessimistic even for chess hash tables?
>>>>>
>>>>>this is optimistic, because those latency numbers are sequential latency
>>>>>numbers. Already opened gates at the RAM you can read faster from than if you
>>>>>must open a new one at a random spot.
>>>>>
>>>>>Trivially hashtables you have not opened it at that random spot yet.
>>>>>
>>>>>That is an additional latency extra that addes to this 130. Most likely that
>>>>>will add up to like above 280 ns up to 400 ns for dual Xeons DDR ram 133Mhz.
>>>>>
>>>>>Best regards,
>>>>>Vincent
>>>>
>>>>Let's take a simple example for starters:
>>>>
>>>>Say that you read from memory location 0x00000000, then 0x01000000, then
>>>>0x02000000.
>>>>
>>>>Do you define this as sequential? What hardware mechanism makes the accesses at
>>>>0x01000000 and 0x02000000 occur faster than the first access to location
>>>>0x00000000?
>>>
>>>http://www.vml.co.uk/Data/ddr_256mbit.pdf
>>>
>>>It describes it a bit. In this case for DDR ram.
>>>
>>>See for example page 8 the one last line.
>>>
>>>"200 clock cycles are required between the DLL reset and any read command"
>>>
>>>
>>>then in page 17 the explanation:
>>>  "the read command is used to initiate a burst read access to an active row.
>>>   ... if auto precharge is selected, the row being accessed will be precharged
>>>at the end of the read burst; if auto precharge is not selected  then the row
>>>will remain opened for subsequent accesses"
>>>
>>>
>>>and don't forget to checkout page 21.
>>>
>>>and so on. there is enough data there.
>>
>>Do you know what a DLL is? It's a delay locked loop - something similar but
>>simpler than a PLL (phase locked loop.) These are often used in digital circuits
>>for things like doubling a clock frequency, getting delays which are a fraction
>>of clock long,... (Xilinx has some good material on this which you can check
>>out.)
>>
>>Now the quote that you gave from page 8 is from the section "Initialization -
>>DDR SDRAMs must be powered up and initialized in a predefined manner" I don't
>>know why you think that this has anything to do with normal reads or writes. The
>>200 ns that you refer to is typically a one time operation.
>>
>>I already know about the second item that you quoted. Noticed that my addresses
>>were not in the same row. So this does not apply.
>>
>>You might look at the part that says:
>>"3. BA0-BA1 provide bank address and A0-A12 provide row address.
>> 4. BA0-BA1 provide bank address; A0-Ai provide column address (where i=8 for
>>x16, 9 for x8 and 11 for x4 except A10); A10 HIGH enables the auto precharge
>>feature (nonpersistent), A10 LOW disables the auto precharge feature"
>>
>>Just looking at that do you think that all of the addresses that I gave are in
>>the same row?
>>
>>If not, then doesn't that imply that the row will have to be opened for each
>>successive access?
>>
>>I did some DRAM controller design about 10 years ago, and the internals haven't
>>really changed that much. I've never done any DDR design but from a quick look
>>here's my SWAG at it:
>>
>>Let's assume that we need to do a ACTIVE then READ then PRECHARGE with CL=2 DDR
>>RAM operating with a clock frequency of 133 MHz. I believe that this adds up to
>>about 9 clocks which would be almost 70 ns. See tRCD (18 ns) + tRP (18 ns) plus
>>the CL=2 read access. Then you have to add in the additional delays inside of
>>the chipset and the processor.
>>
>>Please point out the missing ns in the above.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.