Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Since the CPU is what really count for Chess !

Author: Robert Hyatt

Date: 15:20:14 03/18/03

Go up one level in this thread


On March 18, 2003 at 17:46:10, Tom Kerrigan wrote:

>On March 18, 2003 at 16:37:35, Robert Hyatt wrote:
>
>>>>1.  no interleaving, which means that the raw memory latency is stuck at
>>>>120+ns and stays there.  Faster bus means nothing without interleaving,
>>>>if latency is the problem.
>>>
>>>Uh, wait a minute, didn't you just write a condescending post to me about how
>>>increasing bandwidth improves latency? (Which I disagree with...) You can't have
>>>it both ways.
>>>
>>>Faster bus speed improves both latency and bandwidth. How can it not?
>>
>>It doesn't affect random latency whatsoever.  It does affect the time taken to
>>load a
>>cache line.  Which does affect latency in a different way.  However,
>>interleaving does
>>even better as even though it doesn't change latency either, it will load a
>>cache line even
>>faster.
>
>Are you kidding me? How can FSB speed _not_ affect latency?

Very simple.  Latency is caused _in_ the memory system, only a tiny part of
latency
is caused by the delay of shipping the data over the bus.  If you ran the bus at
10ghz,
you would _still_ see 120ns latency because you can't read DRAM any faster than
that for the first read to a random location...



>If you have 133MHz
>CAS2 memory and a 100MHz FSB, it takes 2*(1/100M)=20ns + your northbridge
>overhead to do a random access. Increase the bus speed to 133MHz, now the access
>takes 2*(1/133M)=15ns + northbridge overhead. So it gets 5ns faster. I don't see
>how it could possibly _not_ get faster. Where is the flaw in my logic?


Run the test.  This discussion was held on r.g.c.p a while back.  And the _same_
results were found.  Memory has 120ns latency no matter _what_ memory you
use.  RDRAM is even slower in terms of latency.  If you can get your memory to
sub-100ns latency, you've done a miracle in modern electronics.

The problem is that the time required to ship the data across the bus is a
_small_
fraction of the total transaction time.  Driving it to zero still leaves that
120ns
that is spread over everything but the bus.





>
>Second, are you sure an entire cache line has to be filled before any of the
>data can be used?

No, but if you fire of _another_ memory access, _that_ one can't be started
until the
previous cache line fill is completed.  I thought I had said that.  If it wasn't
clear, that
was my fault.  However, on average, 1/2 of the cache line has to be filled
before the
data is available since on average 1/2 of the accesses will be in the front
half, and 1/2
will be in the back half...


>I thought memory timings like 2-1-1-1 meant that the word that
>was requested took 2 cycles to access whereas the neighboring words then took 1
>cycle each to access during the burst transfer. If the requested word is being
>sent first, you'd think the northbridge and processor would take advantage of
>that fact. Do you have any information to the contrary? I mean, if that _wasn't_
>the case, don't you think memory timings would be given as the total of all the
>numbers, e.g., 5 instead of 2-1-1-1? I mean, if the 2 isn't significant, why
>advertise it?
>


So far as I know, current cache controllers start at the front of a line.  There
was talk a
few years back about changing this and I haven't tried to follow it terribly
closely, but
the logic to start at the middle of a line and then wrap back around to the
front was (at
the time) deemed way too complex to deal with.  I saw this when preliminary
discussions
about the original pentium-pro were being held.  I don't recall how it all ended
however.

The point of modern burst-mode memory is that the first 8 bytes takes 120ns
roughly to
fetch, then the next N bytes come across at near bus speed.  But that first
delay is the
killer I was talking about.

The original timing number is the raw overall latency to start to read a
particular memory
column.  That was where the derivatives such as fast page mode, EDO, SDRAM,
RDRAM,
and so forth started, by reading more data internally and after suffering thru
the first huge
latency delay, dumping the remainder of the requested data as quickly as
possible.  That is
why RDRAM is so damned bad for chess also, as the initial latency is even higher
although
the streaming bandwidth is also much higher once you suffer thru the initial
start-up delay.

There are some good latency analysis programs running around on the net, for
anyone
wanting to see that their "60ns DRAM memory" is really 120ns when you factor in
the
entire pipeline from memory chip to the processor, thru two levels of cache, the
bus,
the bridge, etc...

Running up the bus speed can help, but only on the small fraction of the time
that is actually
used in the bus transaction.  And that time really is small...


>-Tom



This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.