Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Another memory latency test

Author: Dieter Buerssner

Date: 14:35:33 07/17/03

Go up one level in this thread


On July 17, 2003 at 17:05:00, Robert Hyatt wrote:

>On July 17, 2003 at 09:16:10, Dieter Buerssner wrote:
>
>>I use an inner loop, that just translates to a stream of move memory to register
>>instructions (one for each access). Here are some results (source at the end of
>>the posting, not well tested, please report the errors/flaws ...)
>>
>
>The main flaw is you are not testing "memory latency" here.  If you look at
>how the X86 does virtual memory, it is a two-level memory lookup.  To avoid
>this penalty, the TLB holds recent virtual-to-real address mapings, but the
>TLB is not huge.  On my dual xeon, lm-bench reports that the TLB holds the
>most recent 62 virtual-to-real translations.  What you are measuring is
>at least _two_ memory latency cycles, one or two to do the virtual to real
>address translation, then another to actually fetch the data.

Please see also my answer to Gerd in this thread. And, you might have recognized
(from the former thread), that I was a bit aware of this (I was the first one,
to mention virtual memory at all).

From your former posts:
---
Bob:
No.  lm-bench does _random_ reads and computes the _random-access_
latency.
---

I cannot find any randomness in the reads of lm-bench (I downloaded latest
stable source today, not the experimental version, available, too). If it would
do random reads, it would have no way to avoid the problem with the TLBs you
explained.

And (Bob is the uncited one):
---
>>Host                 OS   Mhz   L1 $   L2 $    Main mem    Guesses
>>--------- -------------   ---   ----   ----    --------    -------
>>scrappy    Linux 2.4.20   744 4.0370 9.4300       130.2
>>
>>>In the lmbench paper they have a nice graph like this.
>>
>>
>>Is the above what you want?
>
>I think that it's as close as you're going to get. The most important thing is
>that 130 [ns] is the largest number. And wouldn't that be a little bit
>pessimistic even for chess hash tables?


I don't think so, although, in the case of crafty, the actual latency is
about 1/3 of that, since I read three positions and you would ammortize the
latency over those three positions rather than just over one.
---

This also seems to imply, that you set latency equal to the value I and Vincent
measured. In your current answer to my post, you seem to switch the context.

Also, you may remember, that I suggested the scheme you are using in Crafty now,
to you (the three contigious cells in HT instead of 2 tables of which one has
the double size than the other one).

>To compute a _real_ raw memory latency number, you have to avoid overwriting
>the TLB too badly.  Otherwise the latency is inflated by the MMU overhead
>that isn't actually hit on "normal application" that badly.

Sure, I don't doubt, that it is not the "real" latency what my code measures.
But that number seems rather uninteresting from (chess engine) programmer's
point of view. I guess many database applications use hashing schemes, and have
similar random access latencies as chess engines. I of course can also imagine
many applications, where this won't play a role (say a numerical calculation,
where you typically access vectors in order, other numerical applications may
have big jumps, too).

Regards,
Dieter



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.