Computer Chess Club Archives




Subject: Re: Another memory latency test

Author: J. Wesley Cleveland

Date: 18:58:18 07/18/03

Go up one level in this thread

On July 18, 2003 at 21:17:14, Robert Hyatt wrote:

>On July 18, 2003 at 15:21:35, J. Wesley Cleveland wrote:
>>On July 17, 2003 at 18:25:51, Robert Hyatt wrote:
>>>On July 17, 2003 at 17:35:33, Dieter Buerssner wrote:
>>>>I cannot find any randomness in the reads of lm-bench (I downloaded latest
>>>>stable source today, not the experimental version, available, too). If it would
>>>>do random reads, it would have no way to avoid the problem with the TLBs you
>>>4M pages solves it for at least 250mb worth of RAM.  But then again, _no_ chess
>>>program depends on purely random memory accesses to blow out the TLB.  The only
>>>truly random accesses I do are the regular hashing and pawn hashing, which
>>>both total to significantly less than the total nodes I search.  Which means
>>>the TLB penalty is not even 1% of my total run time.  Probably closer to
>>>.01% - .05%.
>>>I ignore that.
>>Why do you think it is that low? I get ~20-30% of nodes have hash probes with
>Look at the code.
I not only looked at the code. I *instrumented it*. I won't have complete
results until Monday, but it appears that crafty spends 3-5% of its total time
inside hashprobe on my (slow) machine and a prefetch could reduce that by about

>Crafty probes memory _once_ for a hash probe.  That
>introduces a memory access penalty once per node in the basic search,
>less than once per node in the q-search (I only probe phash there and I
>don't probe it but about 25% of the q-search nodes I visit).

If you had read whai I wrote, you would see I said crafty does a hash probe
20-30% of its total nodes.

>As a result, you get less than one probe per node searched.  A node searched
>requires something on the order of 3000-5000 instructions.  What percentage
>of that 3K-5K instruction timing is that single hash probe?  Almost zero.

Except that a fast machine may do these 3-5K instructions in <1usec. A cache
miss + a TLB miss may take 300-400 ns. I would not call 30% almost 0.
>Ignore hits and misses, that is not the issue here.  The issue is the cost of
>doing the probe itself, which is essentially zero.
>>If you are getting 1m nodes/sec, then this is a probe every 3-5 usec.
>>With a very large hash table and 4K pages, the large majority of these will
>>cause a TLB miss. At 200 nsec each (a guess), this could be up to 5% of your
>>total run time.
>See above.  I don't really probe once for every node.

See above. I never said you did.

This page took 0.24 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.