Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Another memory latency test

Author: Robert Hyatt
Date: 11:52:10 07/20/03
On July 19, 2003 at 00:36:26, Peter McKenzie wrote:

>On July 18, 2003 at 23:45:16, Robert Hyatt wrote:
>
>>On July 18, 2003 at 21:58:18, J. Wesley Cleveland wrote:
>>
>>>On July 18, 2003 at 21:17:14, Robert Hyatt wrote:
>>>
>>>>On July 18, 2003 at 15:21:35, J. Wesley Cleveland wrote:
>>>>
>>>>>On July 17, 2003 at 18:25:51, Robert Hyatt wrote:
>>>>>
>>>>>>On July 17, 2003 at 17:35:33, Dieter Buerssner wrote:
>>>>>>
>>>>>[snip]
>>>>>>>
>>>>>>>I cannot find any randomness in the reads of lm-bench (I downloaded latest
>>>>>>>stable source today, not the experimental version, available, too). If it would
>>>>>>>do random reads, it would have no way to avoid the problem with the TLBs you
>>>>>>>explained.
>>>>>>
>>>>>>4M pages solves it for at least 250mb worth of RAM.  But then again, _no_ chess
>>>>>>program depends on purely random memory accesses to blow out the TLB.  The only
>>>>>>truly random accesses I do are the regular hashing and pawn hashing, which
>>>>>>both total to significantly less than the total nodes I search.  Which means
>>>>>>the TLB penalty is not even 1% of my total run time.  Probably closer to
>>>>>>.01% - .05%.
>>>>>>
>>>>>>I ignore that.
>>>>>
>>>>>Why do you think it is that low? I get ~20-30% of nodes have hash probes with
>>>>>crafty.
>>>>
>>>>
>>>>Look at the code.
>>>I not only looked at the code. I *instrumented it*. I won't have complete
>>>results until Monday, but it appears that crafty spends 3-5% of its total time
>>>inside hashprobe on my (slow) machine and a prefetch could reduce that by about
>>>half.
>>>
>>>>Crafty probes memory _once_ for a hash probe.  That
>>>>introduces a memory access penalty once per node in the basic search,
>>>>less than once per node in the q-search (I only probe phash there and I
>>>>don't probe it but about 25% of the q-search nodes I visit).
>>>
>>>If you had read whai I wrote, you would see I said crafty does a hash probe
>>>20-30% of its total nodes.
>>
>>OK.  I clearly mis-read what you meant.  the 20-30% was eye-catching as that
>>is a pretty common hash hit percentage as well...
>>
>>
>>>
>>>>As a result, you get less than one probe per node searched.  A node searched
>>>>requires something on the order of 3000-5000 instructions.  What percentage
>>>>of that 3K-5K instruction timing is that single hash probe?  Almost zero.
>>>
>>>Except that a fast machine may do these 3-5K instructions in <1usec. A cache
>>>miss + a TLB miss may take 300-400 ns. I would not call 30% almost 0.
>>
>>You are missing my point.  In the position(s) you tested, you saw 20-30%
>
>Bob, there is really no need for the somewhat hostile 'You are missing my
>point', especially as its clear you have been somewhat careless in paying
>attention to Mr Cleveland's politely made points.

I don't see _anything_ hostile in "you are missing my point."  I didn't say
"you are too dumb to understand my point" or anything similar.  "you are
missing my point" could mean (a) you overlooked something; (b) you misunderstood
something;  (c) I didn't explain it clearly;  (d) any of another dozen things.

If that sounds "hostile" it wasn't intentional.  However, it doesn't sound
the least bit hostile to me.

>
>I understand that Vincent has gotten your back up on this issue, but if you can
>forget all the crap for a minute I think there might be something of value in
>all this stuff.
>
>Its just possible that your statement of:
>"Which means the TLB penalty is not even 1% of my total run time.  Probably
>closer to .01% - .05%."
>is a bit off the mark.  No shame if it is.
>
>Here's hoping we can find out in a scientific manner.

I've measured this in the past.  In fact, if you look back thru the
archive, you'll note that when I changed from the two-table approach to
a single probe to get two entries (exactly the same approach, but with
one less random memory access) the improvement was minimal.  In the 1%
or less range if I recall correctly.  I don't think a _single_ memory
read is going to affect a thing.

>
>cheers,
>Peter
>
>
>>hash probes.  That means one probe for every 3-5 nodes.  At 1M nodes
>>per second, that is 200K-300K probes per second.  If you measure the
>>time spent in searching a single node, multiply that by 3-5X, then compare
>>that to the hash probe time, the time spent probing the hash table is low.
>>
>>Note that your 5% is _not_ the total time used to probe the table.  It is
>>the time to probe the table, and do it _twice_ although the second probe
>>doesn't have any memory access penalty associated with it in most cases.
>>
>>So a big percent of that 5% is doing the actual work done in HashProbe(),
>>rather than being all memory access penalty...
>>
>>
>>
>>
>>
>>
>>>>
>>>>Ignore hits and misses, that is not the issue here.  The issue is the cost of
>>>>doing the probe itself, which is essentially zero.
>>>>
>>>>
>>>>
>>>>
>>>>>If you are getting 1m nodes/sec, then this is a probe every 3-5 usec.
>>>>>With a very large hash table and 4K pages, the large majority of these will
>>>>>cause a TLB miss. At 200 nsec each (a guess), this could be up to 5% of your
>>>>>total run time.
>>>>
>>>>See above.  I don't really probe once for every node.
>>>
>>>See above. I never said you did.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.