Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Source code to measure it - there is something wrong

Author: Robert Hyatt

Date: 15:27:54 07/17/03

On July 17, 2003 at 17:37:33, Keith Evans wrote:

>On July 17, 2003 at 17:26:50, Robert Hyatt wrote:
>
>>On July 17, 2003 at 02:17:29, Gerd Isenberg wrote:
>>
>>>>>And, after all, we use virtual memory nowadays. Doesn't this include one more
>>>>>indirection (done by hardware). Without knowing much about it, I wouldn't be
>>>>>surprized, that hardware time for those indirections is needed more often with
>>>>>the random access style pattern.
>>>>
>>>>You are talking about the TLB.
>>>>
>>>>The memory mapping hardware needs two memory references to compute a real
>>>>address before it can be accessed.  The TLB keeps the most recent N of these
>>>>things around.  If you go wild with random accessing, you will _certainly_
>>>>make memory latency 3x what it should be, because the TLB entries are 100%
>>>>useless.  Of course that is not sensible because 90+ percent of the memory
>>>>references in a chess program are _not_ scattered all over memory.
>>>>
>>>
>>>Aha, that's interesting. So memory latency is really the time between switching
>>>the physical address to the bus and getting the data _and_ does not consider
>>>translation from virtual to physical addresses via TLB (Translation Look-aside
>>>buffer)?
>>>
>>>So Vincent's benchmark seems not that bad to get a feeling for "worst case"
>>>virtual address latency - which is likely for hashtable reads.
>>
>>Sure.  But that simply isn't "memory latency".  And, as I mentioned in another
>>post, the PC supports 4K or 4M pages.  4M pages means a 62 entry TLB is good
>>for over 1/4 gig of RAM, accessed randomly, with _no_ TLB penalty.
>>
>>The X86 also supports a three-level map, which would add even another cycle
>>to the virtual-to-real translation, should a system use it.  I'd think a saner
>>approach would be to step up to 4M pagesize before going to that huge map
>>overhead.
>>
>>BTW, lm-bench says my xeon has 62 TLB entries.  I've not verified that from
>>Intel however.
>>
>>>
>>>Gerd
>
>So I guess that you can make your hash tables too big ;-)
>
>If this is the cause of the discrepancy, can't those other benchmarks be run
>with say a 250 MB array, and see a reduced latency?


If the O/S cooperates and switches to 4mb pages.  If they stick with 4kb, the
table has to be small.  Small enough it will stick in cache most likely, which
will wreck the measure.

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.