Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Precharging at DDR ram

Author: Vincent Diepeveen

Date: 04:51:05 07/17/03

Go up one level in this thread


seemingly hyatt has been asking around:

http://www.talkchess.com/forums/1/message.html?306766

On July 17, 2003 at 00:26:21, Keith Evans wrote:

>On July 16, 2003 at 22:40:10, Vincent Diepeveen wrote:
>
>>On July 16, 2003 at 13:04:40, Keith Evans wrote:
>>
>>>On July 16, 2003 at 07:20:50, Vincent Diepeveen wrote:
>>>
>>>>On July 16, 2003 at 00:44:34, Keith Evans wrote:
>>>>
>>>>>On July 16, 2003 at 00:29:43, Robert Hyatt wrote:
>>>>>
>>>>>>On July 16, 2003 at 00:05:29, Keith Evans wrote:
>>>>>>
>>>>>>>On July 15, 2003 at 23:35:30, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On July 15, 2003 at 23:05:37, Vincent Diepeveen wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Now i can disproof again the 130ns figure that Bob keeps giving here for dual
>>>>>>>>>machines and something even faster than that for single cpu (up to 60ns or
>>>>>>>>>something). Then i'm sure he'll be modifying soon his statement something like
>>>>>>>>>to "that it is not interesting to know the time of a hashtable lookup, because
>>>>>>>>>that is not interesting to know; instead the only scientific intersting thing is
>>>>>>>>>to know is how much bandwidth a machine can actually achieve".
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>What is _interesting_ is the fact that you are incapable of even recalling
>>>>>>>>the numbers I posted.
>>>>>>>>
>>>>>>>>to wit:
>>>>>>>>
>>>>>>>>dual xeon 2.8ghz, 400mhz FSB.  149ns latency
>>>>>>>>
>>>>>>>>PIII/750 laptop, SDRAM.  125ns.
>>>>>>>>
>>>>>>>>Aaron posted the 60+ ns numbers for his overclocked athlon.  I assume his
>>>>>>>>numbers are as accurate as mine since he _did_ run lm_bench, rather than
>>>>>>>>something with potential bugs.
>>>>>>>>
>>>>>>>>I can post bandwidth numbers if you want, but that has nothing to do with
>>>>>>>>latency, as those of us understanding architecture already know.
>>>>>>>>
>>>>>>>
>>>>>>>Can you run lmbench and give the latency numbers for different stride sizes?
>>>>>>>Then you could quote numbers from cache,...
>>>>>>>
>>>>>>
>>>>>>Here's my laptop data.  L1 seems to be 4 clocks.  L2 9 clocks, memory
>>>>>>at 130ns.  This is a PIII/750mhs machine with SDRAM.  I just ran it again
>>>>>>to produce these numbers.
>>>>>>
>>>>>>
>>>>>>
>>>>>>Host                 OS   Mhz   L1 $   L2 $    Main mem    Guesses
>>>>>>--------- -------------   ---   ----   ----    --------    -------
>>>>>>scrappy    Linux 2.4.20   744 4.0370 9.4300       130.2
>>>>>>
>>>>>>>In the lmbench paper they have a nice graph like this.
>>>>>>
>>>>>>
>>>>>>Is the above what you want?
>>>>>
>>>>>I think that it's as close as you're going to get. The most important thing is
>>>>>that 130 [ns] is the largest number. And wouldn't that be a little bit
>>>>>pessimistic even for chess hash tables?
>>>>
>>>>this is optimistic, because those latency numbers are sequential latency
>>>>numbers. Already opened gates at the RAM you can read faster from than if you
>>>>must open a new one at a random spot.
>>>>
>>>>Trivially hashtables you have not opened it at that random spot yet.
>>>>
>>>>That is an additional latency extra that addes to this 130. Most likely that
>>>>will add up to like above 280 ns up to 400 ns for dual Xeons DDR ram 133Mhz.
>>>>
>>>>Best regards,
>>>>Vincent
>>>
>>>Let's take a simple example for starters:
>>>
>>>Say that you read from memory location 0x00000000, then 0x01000000, then
>>>0x02000000.
>>>
>>>Do you define this as sequential? What hardware mechanism makes the accesses at
>>>0x01000000 and 0x02000000 occur faster than the first access to location
>>>0x00000000?
>>
>>http://www.vml.co.uk/Data/ddr_256mbit.pdf
>>
>>It describes it a bit. In this case for DDR ram.
>>
>>See for example page 8 the one last line.
>>
>>"200 clock cycles are required between the DLL reset and any read command"
>>
>>
>>then in page 17 the explanation:
>>  "the read command is used to initiate a burst read access to an active row.
>>   ... if auto precharge is selected, the row being accessed will be precharged
>>at the end of the read burst; if auto precharge is not selected  then the row
>>will remain opened for subsequent accesses"
>>
>>
>>and don't forget to checkout page 21.
>>
>>and so on. there is enough data there.
>
>Do you know what a DLL is? It's a delay locked loop - something similar but
>simpler than a PLL (phase locked loop.) These are often used in digital circuits
>for things like doubling a clock frequency, getting delays which are a fraction
>of clock long,... (Xilinx has some good material on this which you can check
>out.)
>
>Now the quote that you gave from page 8 is from the section "Initialization -
>DDR SDRAMs must be powered up and initialized in a predefined manner" I don't
>know why you think that this has anything to do with normal reads or writes. The
>200 ns that you refer to is typically a one time operation.
>
>I already know about the second item that you quoted. Noticed that my addresses
>were not in the same row. So this does not apply.
>
>You might look at the part that says:
>"3. BA0-BA1 provide bank address and A0-A12 provide row address.
> 4. BA0-BA1 provide bank address; A0-Ai provide column address (where i=8 for
>x16, 9 for x8 and 11 for x4 except A10); A10 HIGH enables the auto precharge
>feature (nonpersistent), A10 LOW disables the auto precharge feature"
>
>Just looking at that do you think that all of the addresses that I gave are in
>the same row?
>
>If not, then doesn't that imply that the row will have to be opened for each
>successive access?
>
>I did some DRAM controller design about 10 years ago, and the internals haven't
>really changed that much. I've never done any DDR design but from a quick look
>here's my SWAG at it:
>
>Let's assume that we need to do a ACTIVE then READ then PRECHARGE with CL=2 DDR
>RAM operating with a clock frequency of 133 MHz. I believe that this adds up to
>about 9 clocks which would be almost 70 ns. See tRCD (18 ns) + tRP (18 ns) plus
>the CL=2 read access. Then you have to add in the additional delays inside of
>the chipset and the processor.
>
>Please point out the missing ns in the above.



This page took 0.02 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.