Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Since the CPU is what really count for Chess !

Author: Matt Taylor
Date: 12:58:37 03/19/03
On March 19, 2003 at 13:52:31, Robert Hyatt wrote:

>On March 19, 2003 at 12:53:42, Matt Taylor wrote:
>
>>On March 18, 2003 at 23:09:08, Robert Hyatt wrote:
>>
>>>On March 18, 2003 at 19:45:43, Tom Kerrigan wrote:
>>>
>>>>On March 18, 2003 at 18:20:14, Robert Hyatt wrote:
>>>>
>>>>>On March 18, 2003 at 17:46:10, Tom Kerrigan wrote:
>>>>>
>>>>>>On March 18, 2003 at 16:37:35, Robert Hyatt wrote:
>>>>>>
>>>>>>>>>1.  no interleaving, which means that the raw memory latency is stuck at
>>>>>>>>>120+ns and stays there.  Faster bus means nothing without interleaving,
>>>>>>>>>if latency is the problem.
>>>>>>>>
>>>>>>>>Uh, wait a minute, didn't you just write a condescending post to me about how
>>>>>>>>increasing bandwidth improves latency? (Which I disagree with...) You can't have
>>>>>>>>it both ways.
>>>>>>>>
>>>>>>>>Faster bus speed improves both latency and bandwidth. How can it not?
>>>>>>>
>>>>>>>It doesn't affect random latency whatsoever.  It does affect the time taken to
>>>>>>>load a
>>>>>>>cache line.  Which does affect latency in a different way.  However,
>>>>>>>interleaving does
>>>>>>>even better as even though it doesn't change latency either, it will load a
>>>>>>>cache line even
>>>>>>>faster.
>>>>>>
>>>>>>Are you kidding me? How can FSB speed _not_ affect latency?
>>>>>
>>>>>Very simple.  Latency is caused _in_ the memory system, only a tiny part of
>>>>>latency
>>>>>is caused by the delay of shipping the data over the bus.  If you ran the bus
>>>>...
>>>>>Run the test.  This discussion was held on r.g.c.p a while back.  And the _same_
>>>>>results were found.  Memory has 120ns latency no matter _what_ memory you
>>>>>use.  RDRAM is even slower in terms of latency.  If you can get your memory to
>>>>>sub-100ns latency, you've done a miracle in modern electronics.
>>>>
>>>>I guess I'm sitting in front of one miraculous computer, then, because it can
>>>>randomly access a word in 75ns. Just ran the test. (RDRAM, BTW.)
>>>
>>>Yes you are.  You have the fastest single CPU on the planet.  Notice that to
>>>do this test, you have to access a byte, skip down 128 bytes and access another
>>>and repeat this for a _long_ set of addresses.  If you _still_ get 75ns
>>>you _do_ have the fastest PC latency ever reported by any serious tester.
>>
>>AMD thinks so too. The most accurate figure I've found is about 70 ns for the
>>on-die memory controller that Clawhammer has. (I saw some claims of sub-40 ns,
>>but I find that hard to believe.)
>>
>>>>If you have a 133MHz DIMM that's rated at 2-1-1-1, it can obviously access a
>>>>word in 15ns.
>>>
>>>I don't believe 15ns for a second.  Just look at current specs for DRAM and
>>>tell me how that is going to happen?  Again, look at any memory benchmarking
>>>done on the internet by folks that do this for a living.  _nobody_ has reported
>>>sub 100ns latency for any test I have seen, when talking about the PC.  Or
>>>when talking about a sixty million dollar Cray.
>>
>>15 ns is believable. You must remember that ram is configured as rows and
>>columns. The full 100-120 ns is the latency of opening a new row and reading.
>>You and Tom seem to be talking about different things here. A completely random
>>access is going to hit RAS and stall the full 100-120 ns. Reloading the column
>>will only hit CAS and stall for 15 ns.
>
>The only memory latency that is interesting is "random access latency".
>Anything else
>plays right into things like RDRAM and makes it look great, when its random
>access
>latency is bad.  The DDR memory in my dual xeon is 150ns which looks poor
>compared
>to the 130ns SDRAM in my much cheaper PIII laptop.

Your "cheaper PIII laptop" doesn't need all the extra baggage that comes with
dual processors.

>>>> If the system gets that word in 75ns (ignoring RDRAM vs. DIMM
>>>>latency for now) that means 20% of the latency is from the memory and 80% (not
>>>>"a tiny part") is from "shipping the data over the bus" (and through the
>>>>northbridge). Conventional wisdom says there's a 10ns wire/pin delay for a
>>>>signal going into or out of a chip, so into northbridge + out of northbridge +
>>>>into processor = 30ns. That means 30ns of processing is done on the northbridge
>>>>and processor. That's why everybody is so worked up about Hammer's on-die memory
>>>>controller--it reduces memory latency by, well, somewhere between 20 and 50ns,
>>>>or roughly 50%.
>>>>
>>>>End of today's lecture...
>>>
>>>Now to get some _real_ data before giving the _next_ lecture.  As I said,
>>>access 1M bytes, with a 128 byte stride so cache-line pre-fetching won't
>>>artificially bias the result downward.
>>>
>>>I'll try to run this on a group of dual xeons here tomorrow, starting with my
>>>2.8's and also trying the 3.06's.
>>>
>>>Several of us did this on R.G.C.P a few months back however, and 120+ ns
>>>was the _best_ time reported when the test was run correctly.
>>
>>I got 133 ns as well. Aaron was running tests like crazy this morning on his
>>nForce 2, and he reported times as low as 70 ns. I find that -very- impressive.
>>Of course, that was with massive memory overclocking.
>>
>>-Matt
>
>Still it is the fastest time I have _ever_ seen for memory latency.  The X86
>pipeline
>to memory is _so_ long.  Going thru the mmu, to L1, to L2 to the bus, to the
>memory
>controller, to the chip, pulling that off in 70ns reliably is _remarkable_.
>Particularly
>when you consider that the latency for 60 million dollar computers like the Cray
>T90
>is over 100ns.  The infamous Cray-2, the first machine with 32 gigabytes of RAM
>had
>a similar latency.  2.1ns clock, 50 clock cycle latency.

I believe the MMU comes after L1 & L2 to reduce latency of cache accesses. I
thought it was awkward at first (reloading cr3 therefore invalidates the entire
cache), but it makes sense.

>So 70 seems not only good, but _incredibly_ good.  lmbench is pretty good at
>measuring
>this accurately, so long as you are sure you tell it to use way more RAM than
>will fit into
>cache.
>
>I broke it on my quad 700 due to the 1MB L2, but when I had it use 128mb for the
>memory
>tests, things dropped back to 130ns for that box as well, but my quad 700 used
>SDRAM which
>seems to be the best there is right now.

Agreed, 70 ns is superb. AMD talks about Clawhammer's 80 ns latency being a big
deal. I can't get a solid figure (only guesses), but the 70 ns seems accurate.
When Opteron comes out, I plan to get one, and I'll see about testing memory
latency at that point.

-Matt
Re: Since the CPU is what really count for Chess ! Robert Hyatt 14:29:58 03/19/03
- Re: Since the CPU is what really count for Chess ! Robert Hyatt 14:35:08 03/19/03
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.