Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Since the CPU is what really count for Chess !

Author: Robert Hyatt

Date: 15:20:14 03/18/03

On March 18, 2003 at 17:46:10, Tom Kerrigan wrote:

>On March 18, 2003 at 16:37:35, Robert Hyatt wrote:
>
>>>>1.  no interleaving, which means that the raw memory latency is stuck at
>>>>120+ns and stays there.  Faster bus means nothing without interleaving,
>>>>if latency is the problem.
>>>
>>>Uh, wait a minute, didn't you just write a condescending post to me about how
>>>increasing bandwidth improves latency? (Which I disagree with...) You can't have
>>>it both ways.
>>>
>>>Faster bus speed improves both latency and bandwidth. How can it not?
>>
>>It doesn't affect random latency whatsoever.  It does affect the time taken to
>>load a
>>cache line.  Which does affect latency in a different way.  However,
>>interleaving does
>>even better as even though it doesn't change latency either, it will load a
>>cache line even
>>faster.
>
>Are you kidding me? How can FSB speed _not_ affect latency?

Very simple.  Latency is caused _in_ the memory system, only a tiny part of
latency
is caused by the delay of shipping the data over the bus.  If you ran the bus at
10ghz,
you would _still_ see 120ns latency because you can't read DRAM any faster than
that for the first read to a random location...

>If you have 133MHz
>CAS2 memory and a 100MHz FSB, it takes 2*(1/100M)=20ns + your northbridge
>overhead to do a random access. Increase the bus speed to 133MHz, now the access
>takes 2*(1/133M)=15ns + northbridge overhead. So it gets 5ns faster. I don't see
>how it could possibly _not_ get faster. Where is the flaw in my logic?

Run the test.  This discussion was held on r.g.c.p a while back.  And the _same_
results were found.  Memory has 120ns latency no matter _what_ memory you
use.  RDRAM is even slower in terms of latency.  If you can get your memory to
sub-100ns latency, you've done a miracle in modern electronics.

The problem is that the time required to ship the data across the bus is a
_small_
fraction of the total transaction time.  Driving it to zero still leaves that
120ns
that is spread over everything but the bus.

>
>Second, are you sure an entire cache line has to be filled before any of the
>data can be used?

No, but if you fire of _another_ memory access, _that_ one can't be started
until the
previous cache line fill is completed.  I thought I had said that.  If it wasn't
clear, that
was my fault.  However, on average, 1/2 of the cache line has to be filled
before the
data is available since on average 1/2 of the accesses will be in the front
half, and 1/2
will be in the back half...

>I thought memory timings like 2-1-1-1 meant that the word that
>was requested took 2 cycles to access whereas the neighboring words then took 1
>cycle each to access during the burst transfer. If the requested word is being
>sent first, you'd think the northbridge and processor would take advantage of
>that fact. Do you have any information to the contrary? I mean, if that _wasn't_
>the case, don't you think memory timings would be given as the total of all the
>numbers, e.g., 5 instead of 2-1-1-1? I mean, if the 2 isn't significant, why
>advertise it?
>

So far as I know, current cache controllers start at the front of a line.  There
was talk a
few years back about changing this and I haven't tried to follow it terribly
closely, but
the logic to start at the middle of a line and then wrap back around to the
front was (at
the time) deemed way too complex to deal with.  I saw this when preliminary
discussions
about the original pentium-pro were being held.  I don't recall how it all ended
however.

The point of modern burst-mode memory is that the first 8 bytes takes 120ns
roughly to
fetch, then the next N bytes come across at near bus speed.  But that first
delay is the
killer I was talking about.

The original timing number is the raw overall latency to start to read a
particular memory
column.  That was where the derivatives such as fast page mode, EDO, SDRAM,
RDRAM,
and so forth started, by reading more data internally and after suffering thru
the first huge
latency delay, dumping the remainder of the requested data as quickly as
possible.  That is
why RDRAM is so damned bad for chess also, as the initial latency is even higher
although
the streaming bandwidth is also much higher once you suffer thru the initial
start-up delay.

There are some good latency analysis programs running around on the net, for
anyone
wanting to see that their "60ns DRAM memory" is really 120ns when you factor in
the
entire pipeline from memory chip to the processor, thru two levels of cache, the
bus,
the bridge, etc...

Running up the bus speed can help, but only on the small fraction of the time
that is actually
used in the bus transaction.  And that time really is small...

>-Tom

Re: Since the CPU is what really count for Chess ! Tom Kerrigan 16:45:43 03/18/03
- Re: Since the CPU is what really count for Chess ! Robert Hyatt 20:09:08 03/18/03
  - Re: Since the CPU is what really count for Chess ! Tom Kerrigan 19:50:43 03/19/03
    - Re: Since the CPU is what really count for Chess ! Robert Hyatt 20:32:42 03/19/03
  - Re: Since the CPU is what really count for Chess ! Matt Taylor 09:53:42 03/19/03
    - Re: Since the CPU is what really count for Chess ! Robert Hyatt 10:52:31 03/19/03
      - Re: Since the CPU is what really count for Chess ! Matt Taylor 12:58:37 03/19/03
        
        Re: Since the CPU is what really count for Chess ! Robert Hyatt 14:29:58 03/19/03
        
        Re: Since the CPU is what really count for Chess ! Robert Hyatt 14:35:08 03/19/03
- Re: Since the CPU is what really count for Chess ! Matt Taylor 17:49:24 03/18/03
  - Re: Since the CPU is what really count for Chess ! Tom Kerrigan 19:47:51 03/19/03
    - Re: Since the CPU is what really count for Chess ! Matt Taylor 23:42:54 03/19/03
      - 65.3ns :) Aaron Gordon 00:38:14 03/20/03
        
        Re: 65.3ns :) Robert Hyatt 07:48:21 03/20/03
        
        Re: 65.3ns :) Aaron Gordon 08:16:57 03/20/03
- Re: Since the CPU is what really count for Chess ! Jeremiah Penery 17:22:33 03/18/03
  - Some latency data and a challenge Robert Hyatt 23:14:35 03/18/03
    - Re: Some latency data and a challenge (how to do this) Robert Hyatt 08:32:05 03/19/03
      - How's 66.2ns? ;) Aaron Gordon 08:41:01 03/19/03
        
        Re: How's 66.2ns? ;) Robert Hyatt 10:55:12 03/19/03
        
        Re: How's 66.2ns? ;) Tom Kerrigan 19:42:09 03/19/03
        
        Re: How's 66.2ns? ;) Robert Hyatt 19:52:40 03/19/03
        
        Re: How's 66.2ns? ;) Aaron Gordon 11:17:52 03/19/03
        
        Re: How's 66.2ns? ;) Robert Hyatt 11:24:46 03/19/03
        
        Re: How's 66.2ns? ;) Aaron Gordon 12:02:54 03/19/03
        
        Re: How's 66.2ns? ;) Robert Hyatt 12:39:39 03/19/03
        
        Re: How's 66.2ns? ;) Aaron Gordon 12:57:32 03/19/03
        
        Re: How's 66.2ns? ;) Robert Hyatt 19:49:15 03/19/03
        
        Final Results Aaron Gordon 10:21:17 03/19/03
        
        Re: Final Results Aaron Gordon 11:01:56 03/19/03
        
        1 more result Aaron Gordon 10:39:16 03/19/03
        
        Re: 1 more result Robert Hyatt 10:58:01 03/19/03
        
        Result file(s) for download Aaron Gordon 08:46:46 03/19/03
        
        Final result file(s) for download Aaron Gordon 10:48:38 03/19/03
  - Re: Since the CPU is what really count for Chess ! Robert Hyatt 20:11:35 03/18/03
    - Re: Since the CPU is what really count for Chess ! Matt Taylor 09:58:07 03/19/03
      - lmbench on nForce 2 w/AthlonXP 2500 Matt Taylor 23:00:25 03/19/03
        
        Re: lmbench on nForce 2 w/AthlonXP 2500 Aaron Gordon 23:13:51 03/19/03
      - Re: Since the CPU is what really count for Chess ! Robert Hyatt 11:04:27 03/19/03
        
        Re: Since the CPU is what really count for Chess ! Matt Taylor 13:04:04 03/19/03
        
        Re: Since the CPU is what really count for Chess ! Robert Hyatt 19:55:41 03/19/03
        
        Re: 88.86 ns on a Dual Xeon 2.8/533 ScienceMark 2.0 beta Frederic Louguet 03:58:18 03/20/03
        
        Re: 88.86 ns on a Dual Xeon 2.8/533 ScienceMark 2.0 beta Ian Kennedy 23:47:17 03/21/03
        
        Re: 88.86 ns on a Dual Xeon 2.8/533 ScienceMark 2.0 beta Vincent Diepeveen 12:21:37 03/22/03
        
        Re: 88.86 ns on a Dual Xeon 2.8/533 ScienceMark 2.0 beta Robert Hyatt 07:45:33 03/20/03
        
        Re: 88.86 ns on a Dual Xeon 2.8/533 ScienceMark 2.0 beta Matt Taylor 09:11:11 03/20/03
        
        Re: 88.86 ns on a Dual Xeon 2.8/533 ScienceMark 2.0 beta Robert Hyatt 10:14:38 03/20/03
        
        Re: 88.86 ns on a Dual Xeon 2.8/533 ScienceMark 2.0 beta Matt Taylor 15:39:21 03/20/03
        
        Re: 88.86 ns on a Dual Xeon 2.8/533 ScienceMark 2.0 beta Matt Taylor 15:20:22 03/21/03

This page took 0.02 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.