Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Matt Taylor's magic de Bruijn Constant

Author: Robert Hyatt
Date: 06:35:16 07/15/03
On July 15, 2003 at 06:26:54, Vincent Diepeveen wrote:

>On July 14, 2003 at 16:52:50, Gerd Isenberg wrote:
>
>>>>Hi Vincent,
>>>>
>>>>puhh... that's about 1/2 microsecond. I remember the days with
>>>>2MHz - 8085 or Z80 CPU - can't beleave it. A few questions...
>>>
>>>
>>>
>>>Don't believe it because it is _wrong_.  Run "lm-bench" on your computer.
>>>It will very accurately measure random access latency.  The slowest I have
>>>seen is 150ns on my dual, using registered DDRAM.  My laptop uses SDRAM and
>>>clocks in around 120ns.  My quad xeons are all around 125ns.
>>>
>>>I've not seen any 400+ ns numbers although it is very possible that rambus
>>>might be that slow on latency, although it is very fast on bandwidth.
>>
>>
>>
>>Hi Bob,
>>
>>thanks for the prompt answer.
>>I guess Vincent's "worst case" value was related to rambus ;-)
>
>No they are related to hashtable lookups.
>
>Bob's latencies are related to sequential read. For example when scientist
>stream 10 gigabyte in a sequential way.
>
>That is *way* faster than a random lookup in memory. Random lookups the memory
>must first get opened. That is *huge* latency.
>
>So for hashtable lookups use my numbers. See the source code. Run it yourself.
>
>Bob is refusing to do so because he finds sequential latency is closer to the
>truth of what latency is.

No, Bob is refusing to run your code because you don't know what you are
doing.  There are well-known programs for measuring random latency, lm-bench
is one good one.


>
>I do not.
>
>I care what it takes to do a hashtable lookup.
>
>Bob doesn't.

I do care and I know how long it takes, too.  Something you can't say,
apparently.

>
>
>>>>
>>>
>>>>
>>>>I'm not familar with dual-architectures. Is it a kind of shared memory via
>>>>pci-bus? How do you access such ram - are the some alloc like api-functions?
>>>>What happens, if one perocessor writes this memory through cache - what about
>>>>possible cache copies of this address in the other processor, or in general how
>>>>do the severel processor caches syncronise?
>>>>I guess each processor has it's own local main-memory.
>>>>
>>>
>>>
>>>
>>>No.  Each processor sits on the same bus with memory.  So both can access
>>>it independently.  However, cache coherency is a problem, but in the Intel
>>>world it is handled by some clever cache design so that the cache controllers
>>>are aware of what is being done by the "other cache" and knows when the other
>>>cache modifies a value that is in the local cache.  It's messy, but it works.
>>>
>>>Caches still use write-back update policy so that memory is not updated until
>>>the cache line (Modified cache line) is about to be overwritten.  However, if
>>>two caches have the same cache line (memory addresses) and one modifies any of
>>>the cache line, the other invalidates its copy so the next read will refresh
>>>things correctly.
>>>
>>
>>Even more complicated with quads and more...
>>I guess Opteron's Hyper Transport Technology is another approach.
>>
>>>
>>>
>>>
>>>>Do you know the read latencies of single processor P4 or K7 with state of the
>>>>art chipsets?
>>>
>>>
>>>Typical numbers are in the 120-150ns range.  Lower for non-registered type
>>>memory.  Registered memory is mainly used in duals that are set up as servers,
>>>for higher reliability.
>>>
>>>Aaron has a sub-75ns latency machine that is overclocked.  That's the fastest
>>>PC latency I have ever seen.  In fact, it is probably the fastest latency of
>>>any kind I have seen, period.
>>>
>>>
>>>
>>>
>>>>
>>>>1.) if data is already in 1. level cache
>>>
>>>This is a one-cycle deal.
>>>
>>>
>>
>>Aha, so that one cycle explains the opcode latency differene of most
>>instructions with register versus memory operand.
>>
>>>
>>>>2.) if data is in 2. level cache but not in 1.
>>>
>>>This is something like 6 cycles but I don't think there is a standard
>>>"number" here since processor speeds vary so much.
>>>
>>>
>>>
>>>>3.) in worst case, if data is only in main memory but in no cache
>>>
>>>125ns is a good first approximation.
>>>
>>>You can answer _all_ of the above by running lm-bench.  It will tell
>>>you each one of those numbers, plus others.
>>>
>>
>>I will try it.
>>
>>Cheers,
>>Gerd
>>
>>
>>
>>>
>>>
>>>
>>>>
>>>>Thanks in advance,
>>>>Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.