Computer Chess Club Archives




Subject: Re: Another memory latency test

Author: Robert Hyatt

Date: 11:22:56 07/22/03

Go up one level in this thread

On July 21, 2003 at 15:35:17, J. Wesley Cleveland wrote:

>On July 18, 2003 at 23:45:16, Robert Hyatt wrote:
>>On July 18, 2003 at 21:58:18, J. Wesley Cleveland wrote:
>>>On July 18, 2003 at 21:17:14, Robert Hyatt wrote:
>>>>On July 18, 2003 at 15:21:35, J. Wesley Cleveland wrote:
>>>>>On July 17, 2003 at 18:25:51, Robert Hyatt wrote:
>>>>>>On July 17, 2003 at 17:35:33, Dieter Buerssner wrote:
>>>>>>>I cannot find any randomness in the reads of lm-bench (I downloaded latest
>>>>>>>stable source today, not the experimental version, available, too). If it would
>>>>>>>do random reads, it would have no way to avoid the problem with the TLBs you
>>>>>>4M pages solves it for at least 250mb worth of RAM.  But then again, _no_ chess
>>>>>>program depends on purely random memory accesses to blow out the TLB.  The only
>>>>>>truly random accesses I do are the regular hashing and pawn hashing, which
>>>>>>both total to significantly less than the total nodes I search.  Which means
>>>>>>the TLB penalty is not even 1% of my total run time.  Probably closer to
>>>>>>.01% - .05%.
>>>>>>I ignore that.
>>>>>Why do you think it is that low? I get ~20-30% of nodes have hash probes with
>>>>Look at the code.
>>>I not only looked at the code. I *instrumented it*. I won't have complete
>>>results until Monday, but it appears that crafty spends 3-5% of its total time
>>>inside hashprobe on my (slow) machine and a prefetch could reduce that by about
>>>>Crafty probes memory _once_ for a hash probe.  That
>>>>introduces a memory access penalty once per node in the basic search,
>>>>less than once per node in the q-search (I only probe phash there and I
>>>>don't probe it but about 25% of the q-search nodes I visit).
>>>If you had read whai I wrote, you would see I said crafty does a hash probe
>>>20-30% of its total nodes.
>>OK.  I clearly mis-read what you meant.  the 20-30% was eye-catching as that
>>is a pretty common hash hit percentage as well...
>>>>As a result, you get less than one probe per node searched.  A node searched
>>>>requires something on the order of 3000-5000 instructions.  What percentage
>>>>of that 3K-5K instruction timing is that single hash probe?  Almost zero.
>>>Except that a fast machine may do these 3-5K instructions in <1usec. A cache
>>>miss + a TLB miss may take 300-400 ns. I would not call 30% almost 0.
>>You are missing my point.  In the position(s) you tested, you saw 20-30%
>>hash probes.  That means one probe for every 3-5 nodes.  At 1M nodes
>>per second, that is 200K-300K probes per second.  If you measure the
>>time spent in searching a single node, multiply that by 3-5X, then compare
>>that to the hash probe time, the time spent probing the hash table is low.
>>Note that your 5% is _not_ the total time used to probe the table.  It is
>>the time to probe the table, and do it _twice_ although the second probe
>>doesn't have any memory access penalty associated with it in most cases.
>>So a big percent of that 5% is doing the actual work done in HashProbe(),
>>rather than being all memory access penalty...
>I ran some tests on my slow (450 Mhz) machine. Hash was set to 192Mb. The test
>was 21 middle-game positions and ran for nearly 1 hour. Crafty got between 125k
>and 230k nps. Crafty spent 3.6% of total time in HashProbe. I added the
>following code just before the call to RepetitionCheck() in search.c (slightly
>modified from the code in hash.c). Note that the code is basically a no-op as
>all variables are local.
>  static BITBOARD word1;
>  BITBOARD temp_hashkey;
>  HASH_ENTRY *htable;
> ----------------------------------------------------------
>|                                                          |
>|   first, compute the initial hash address and choose     |
>|   which hash table (based on color) to probe.            |
>|                                                          |
> ----------------------------------------------------------
>  temp_hashkey=(wtm) ? HashKey : ~HashKey;
>  htable=trans_ref_a+((int) temp_hashkey&hash_maska);
>  word1=htable->word1;
>Now crafty spends 2.8% of its time in HashProbe.

I'm not sure what is supposed to prove?

IE you are going to get more hash failures, assuming you are only doing
a probe to one entry which the above seems to suggest?  That is going to
change the shape of the tree, which is going to change the number of
calls to HashProbe(), which is going to change ...

If you are suggesting that you saved .8% by not using the second entry,
that's not what is happening, if I understand your fix better.  You are
doing _less_ work in HashProbe() because you are never doing the second
probe's work...  Eliminating the second probe, and all the testing,
certainly suggests that the actual time spent waiting for the data for
the second probe is < .1%...

This page took 0.01 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.