Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Since the CPU is what really count for Chess !

Author: Matt Taylor

Date: 09:53:42 03/19/03

Go up one level in this thread


On March 18, 2003 at 23:09:08, Robert Hyatt wrote:

>On March 18, 2003 at 19:45:43, Tom Kerrigan wrote:
>
>>On March 18, 2003 at 18:20:14, Robert Hyatt wrote:
>>
>>>On March 18, 2003 at 17:46:10, Tom Kerrigan wrote:
>>>
>>>>On March 18, 2003 at 16:37:35, Robert Hyatt wrote:
>>>>
>>>>>>>1.  no interleaving, which means that the raw memory latency is stuck at
>>>>>>>120+ns and stays there.  Faster bus means nothing without interleaving,
>>>>>>>if latency is the problem.
>>>>>>
>>>>>>Uh, wait a minute, didn't you just write a condescending post to me about how
>>>>>>increasing bandwidth improves latency? (Which I disagree with...) You can't have
>>>>>>it both ways.
>>>>>>
>>>>>>Faster bus speed improves both latency and bandwidth. How can it not?
>>>>>
>>>>>It doesn't affect random latency whatsoever.  It does affect the time taken to
>>>>>load a
>>>>>cache line.  Which does affect latency in a different way.  However,
>>>>>interleaving does
>>>>>even better as even though it doesn't change latency either, it will load a
>>>>>cache line even
>>>>>faster.
>>>>
>>>>Are you kidding me? How can FSB speed _not_ affect latency?
>>>
>>>Very simple.  Latency is caused _in_ the memory system, only a tiny part of
>>>latency
>>>is caused by the delay of shipping the data over the bus.  If you ran the bus
>>...
>>>Run the test.  This discussion was held on r.g.c.p a while back.  And the _same_
>>>results were found.  Memory has 120ns latency no matter _what_ memory you
>>>use.  RDRAM is even slower in terms of latency.  If you can get your memory to
>>>sub-100ns latency, you've done a miracle in modern electronics.
>>
>>I guess I'm sitting in front of one miraculous computer, then, because it can
>>randomly access a word in 75ns. Just ran the test. (RDRAM, BTW.)
>
>Yes you are.  You have the fastest single CPU on the planet.  Notice that to
>do this test, you have to access a byte, skip down 128 bytes and access another
>and repeat this for a _long_ set of addresses.  If you _still_ get 75ns
>you _do_ have the fastest PC latency ever reported by any serious tester.

AMD thinks so too. The most accurate figure I've found is about 70 ns for the
on-die memory controller that Clawhammer has. (I saw some claims of sub-40 ns,
but I find that hard to believe.)

>>If you have a 133MHz DIMM that's rated at 2-1-1-1, it can obviously access a
>>word in 15ns.
>
>I don't believe 15ns for a second.  Just look at current specs for DRAM and
>tell me how that is going to happen?  Again, look at any memory benchmarking
>done on the internet by folks that do this for a living.  _nobody_ has reported
>sub 100ns latency for any test I have seen, when talking about the PC.  Or
>when talking about a sixty million dollar Cray.

15 ns is believable. You must remember that ram is configured as rows and
columns. The full 100-120 ns is the latency of opening a new row and reading.
You and Tom seem to be talking about different things here. A completely random
access is going to hit RAS and stall the full 100-120 ns. Reloading the column
will only hit CAS and stall for 15 ns.

>> If the system gets that word in 75ns (ignoring RDRAM vs. DIMM
>>latency for now) that means 20% of the latency is from the memory and 80% (not
>>"a tiny part") is from "shipping the data over the bus" (and through the
>>northbridge). Conventional wisdom says there's a 10ns wire/pin delay for a
>>signal going into or out of a chip, so into northbridge + out of northbridge +
>>into processor = 30ns. That means 30ns of processing is done on the northbridge
>>and processor. That's why everybody is so worked up about Hammer's on-die memory
>>controller--it reduces memory latency by, well, somewhere between 20 and 50ns,
>>or roughly 50%.
>>
>>End of today's lecture...
>
>Now to get some _real_ data before giving the _next_ lecture.  As I said,
>access 1M bytes, with a 128 byte stride so cache-line pre-fetching won't
>artificially bias the result downward.
>
>I'll try to run this on a group of dual xeons here tomorrow, starting with my
>2.8's and also trying the 3.06's.
>
>Several of us did this on R.G.C.P a few months back however, and 120+ ns
>was the _best_ time reported when the test was run correctly.

I got 133 ns as well. Aaron was running tests like crazy this morning on his
nForce 2, and he reported times as low as 70 ns. I find that -very- impressive.
Of course, that was with massive memory overclocking.

-Matt



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.