Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Another memory latency test

Author: Peter McKenzie
Date: 21:36:26 07/18/03
On July 18, 2003 at 23:45:16, Robert Hyatt wrote:

>On July 18, 2003 at 21:58:18, J. Wesley Cleveland wrote:
>
>>On July 18, 2003 at 21:17:14, Robert Hyatt wrote:
>>
>>>On July 18, 2003 at 15:21:35, J. Wesley Cleveland wrote:
>>>
>>>>On July 17, 2003 at 18:25:51, Robert Hyatt wrote:
>>>>
>>>>>On July 17, 2003 at 17:35:33, Dieter Buerssner wrote:
>>>>>
>>>>[snip]
>>>>>>
>>>>>>I cannot find any randomness in the reads of lm-bench (I downloaded latest
>>>>>>stable source today, not the experimental version, available, too). If it would
>>>>>>do random reads, it would have no way to avoid the problem with the TLBs you
>>>>>>explained.
>>>>>
>>>>>4M pages solves it for at least 250mb worth of RAM.  But then again, _no_ chess
>>>>>program depends on purely random memory accesses to blow out the TLB.  The only
>>>>>truly random accesses I do are the regular hashing and pawn hashing, which
>>>>>both total to significantly less than the total nodes I search.  Which means
>>>>>the TLB penalty is not even 1% of my total run time.  Probably closer to
>>>>>.01% - .05%.
>>>>>
>>>>>I ignore that.
>>>>
>>>>Why do you think it is that low? I get ~20-30% of nodes have hash probes with
>>>>crafty.
>>>
>>>
>>>Look at the code.
>>I not only looked at the code. I *instrumented it*. I won't have complete
>>results until Monday, but it appears that crafty spends 3-5% of its total time
>>inside hashprobe on my (slow) machine and a prefetch could reduce that by about
>>half.
>>
>>>Crafty probes memory _once_ for a hash probe.  That
>>>introduces a memory access penalty once per node in the basic search,
>>>less than once per node in the q-search (I only probe phash there and I
>>>don't probe it but about 25% of the q-search nodes I visit).
>>
>>If you had read whai I wrote, you would see I said crafty does a hash probe
>>20-30% of its total nodes.
>
>OK.  I clearly mis-read what you meant.  the 20-30% was eye-catching as that
>is a pretty common hash hit percentage as well...
>
>
>>
>>>As a result, you get less than one probe per node searched.  A node searched
>>>requires something on the order of 3000-5000 instructions.  What percentage
>>>of that 3K-5K instruction timing is that single hash probe?  Almost zero.
>>
>>Except that a fast machine may do these 3-5K instructions in <1usec. A cache
>>miss + a TLB miss may take 300-400 ns. I would not call 30% almost 0.
>
>You are missing my point.  In the position(s) you tested, you saw 20-30%

Bob, there is really no need for the somewhat hostile 'You are missing my
point', especially as its clear you have been somewhat careless in paying
attention to Mr Cleveland's politely made points.

I understand that Vincent has gotten your back up on this issue, but if you can
forget all the crap for a minute I think there might be something of value in
all this stuff.

Its just possible that your statement of:
"Which means the TLB penalty is not even 1% of my total run time.  Probably
closer to .01% - .05%."
is a bit off the mark.  No shame if it is.

Here's hoping we can find out in a scientific manner.

cheers,
Peter


>hash probes.  That means one probe for every 3-5 nodes.  At 1M nodes
>per second, that is 200K-300K probes per second.  If you measure the
>time spent in searching a single node, multiply that by 3-5X, then compare
>that to the hash probe time, the time spent probing the hash table is low.
>
>Note that your 5% is _not_ the total time used to probe the table.  It is
>the time to probe the table, and do it _twice_ although the second probe
>doesn't have any memory access penalty associated with it in most cases.
>
>So a big percent of that 5% is doing the actual work done in HashProbe(),
>rather than being all memory access penalty...
>
>
>
>
>
>
>>>
>>>Ignore hits and misses, that is not the issue here.  The issue is the cost of
>>>doing the probe itself, which is essentially zero.
>>>
>>>
>>>
>>>
>>>>If you are getting 1m nodes/sec, then this is a probe every 3-5 usec.
>>>>With a very large hash table and 4K pages, the large majority of these will
>>>>cause a TLB miss. At 200 nsec each (a guess), this could be up to 5% of your
>>>>total run time.
>>>
>>>See above.  I don't really probe once for every node.
>>
>>See above. I never said you did.
Re: Another memory latency test Robert Hyatt 11:52:10 07/20/03
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.