Computer Chess Club Archives




Subject: Re: Matt Taylor's magic de Bruijn Constant

Author: Robert Hyatt

Date: 13:07:27 07/14/03

Go up one level in this thread

On July 14, 2003 at 15:33:37, Gerd Isenberg wrote:

>On July 14, 2003 at 10:54:49, Vincent Diepeveen wrote:
>>On July 13, 2003 at 17:10:10, Russell Reagan wrote:
>>>On July 13, 2003 at 13:17:56, Bas Hamstra wrote:
>>>>It is used *extremely* intensive. Therefore I assumed that most of the time the
>>>>table sits in cache. But apparently no... Makes you wonder about other simple
>>>>lookup's. A lot of 10 cycle penalties, it seems.
>>>Hi Bas,
>>>Why you say "10 cycles"? I thought memory latency was many more cycles (~75 -
>>Random read from memory at dual P4 or dual K7 is like nearly 400 nanoseconds.
>>So that's at 2Ghz around 800 cycles.
>>Best regards,
>Hi Vincent,
>puhh... that's about 1/2 microsecond. I remember the days with
>2MHz - 8085 or Z80 CPU - can't beleave it. A few questions...

Don't believe it because it is _wrong_.  Run "lm-bench" on your computer.
It will very accurately measure random access latency.  The slowest I have
seen is 150ns on my dual, using registered DDRAM.  My laptop uses SDRAM and
clocks in around 120ns.  My quad xeons are all around 125ns.

I've not seen any 400+ ns numbers although it is very possible that rambus
might be that slow on latency, although it is very fast on bandwidth.

>I'm not familar with dual-architectures. Is it a kind of shared memory via
>pci-bus? How do you access such ram - are the some alloc like api-functions?
>What happens, if one perocessor writes this memory through cache - what about
>possible cache copies of this address in the other processor, or in general how
>do the severel processor caches syncronise?
>I guess each processor has it's own local main-memory.

No.  Each processor sits on the same bus with memory.  So both can access
it independently.  However, cache coherency is a problem, but in the Intel
world it is handled by some clever cache design so that the cache controllers
are aware of what is being done by the "other cache" and knows when the other
cache modifies a value that is in the local cache.  It's messy, but it works.

Caches still use write-back update policy so that memory is not updated until
the cache line (Modified cache line) is about to be overwritten.  However, if
two caches have the same cache line (memory addresses) and one modifies any of
the cache line, the other invalidates its copy so the next read will refresh
things correctly.

>Do you know the read latencies of single processor P4 or K7 with state of the
>art chipsets?

Typical numbers are in the 120-150ns range.  Lower for non-registered type
memory.  Registered memory is mainly used in duals that are set up as servers,
for higher reliability.

Aaron has a sub-75ns latency machine that is overclocked.  That's the fastest
PC latency I have ever seen.  In fact, it is probably the fastest latency of
any kind I have seen, period.

>1.) if data is already in 1. level cache

This is a one-cycle deal.

>2.) if data is in 2. level cache but not in 1.

This is something like 6 cycles but I don't think there is a standard
"number" here since processor speeds vary so much.

>3.) in worst case, if data is only in main memory but in no cache

125ns is a good first approximation.

You can answer _all_ of the above by running lm-bench.  It will tell
you each one of those numbers, plus others.

>Thanks in advance,

This page took 0.28 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.