Author: Robert Hyatt
Date: 06:35:16 07/15/03
Go up one level in this thread
On July 15, 2003 at 06:26:54, Vincent Diepeveen wrote: >On July 14, 2003 at 16:52:50, Gerd Isenberg wrote: > >>>>Hi Vincent, >>>> >>>>puhh... that's about 1/2 microsecond. I remember the days with >>>>2MHz - 8085 or Z80 CPU - can't beleave it. A few questions... >>> >>> >>> >>>Don't believe it because it is _wrong_. Run "lm-bench" on your computer. >>>It will very accurately measure random access latency. The slowest I have >>>seen is 150ns on my dual, using registered DDRAM. My laptop uses SDRAM and >>>clocks in around 120ns. My quad xeons are all around 125ns. >>> >>>I've not seen any 400+ ns numbers although it is very possible that rambus >>>might be that slow on latency, although it is very fast on bandwidth. >> >> >> >>Hi Bob, >> >>thanks for the prompt answer. >>I guess Vincent's "worst case" value was related to rambus ;-) > >No they are related to hashtable lookups. > >Bob's latencies are related to sequential read. For example when scientist >stream 10 gigabyte in a sequential way. > >That is *way* faster than a random lookup in memory. Random lookups the memory >must first get opened. That is *huge* latency. > >So for hashtable lookups use my numbers. See the source code. Run it yourself. > >Bob is refusing to do so because he finds sequential latency is closer to the >truth of what latency is. No, Bob is refusing to run your code because you don't know what you are doing. There are well-known programs for measuring random latency, lm-bench is one good one. > >I do not. > >I care what it takes to do a hashtable lookup. > >Bob doesn't. I do care and I know how long it takes, too. Something you can't say, apparently. > > >>>> >>> >>>> >>>>I'm not familar with dual-architectures. Is it a kind of shared memory via >>>>pci-bus? How do you access such ram - are the some alloc like api-functions? >>>>What happens, if one perocessor writes this memory through cache - what about >>>>possible cache copies of this address in the other processor, or in general how >>>>do the severel processor caches syncronise? >>>>I guess each processor has it's own local main-memory. >>>> >>> >>> >>> >>>No. Each processor sits on the same bus with memory. So both can access >>>it independently. However, cache coherency is a problem, but in the Intel >>>world it is handled by some clever cache design so that the cache controllers >>>are aware of what is being done by the "other cache" and knows when the other >>>cache modifies a value that is in the local cache. It's messy, but it works. >>> >>>Caches still use write-back update policy so that memory is not updated until >>>the cache line (Modified cache line) is about to be overwritten. However, if >>>two caches have the same cache line (memory addresses) and one modifies any of >>>the cache line, the other invalidates its copy so the next read will refresh >>>things correctly. >>> >> >>Even more complicated with quads and more... >>I guess Opteron's Hyper Transport Technology is another approach. >> >>> >>> >>> >>>>Do you know the read latencies of single processor P4 or K7 with state of the >>>>art chipsets? >>> >>> >>>Typical numbers are in the 120-150ns range. Lower for non-registered type >>>memory. Registered memory is mainly used in duals that are set up as servers, >>>for higher reliability. >>> >>>Aaron has a sub-75ns latency machine that is overclocked. That's the fastest >>>PC latency I have ever seen. In fact, it is probably the fastest latency of >>>any kind I have seen, period. >>> >>> >>> >>> >>>> >>>>1.) if data is already in 1. level cache >>> >>>This is a one-cycle deal. >>> >>> >> >>Aha, so that one cycle explains the opcode latency differene of most >>instructions with register versus memory operand. >> >>> >>>>2.) if data is in 2. level cache but not in 1. >>> >>>This is something like 6 cycles but I don't think there is a standard >>>"number" here since processor speeds vary so much. >>> >>> >>> >>>>3.) in worst case, if data is only in main memory but in no cache >>> >>>125ns is a good first approximation. >>> >>>You can answer _all_ of the above by running lm-bench. It will tell >>>you each one of those numbers, plus others. >>> >> >>I will try it. >> >>Cheers, >>Gerd >> >> >> >>> >>> >>> >>>> >>>>Thanks in advance, >>>>Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.