Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Another memory latency test

Author: Robert Hyatt
Date: 18:20:00 07/18/03
On July 18, 2003 at 17:27:21, Vincent Diepeveen wrote:

>On July 18, 2003 at 15:21:35, J. Wesley Cleveland wrote:
>
>>On July 17, 2003 at 18:25:51, Robert Hyatt wrote:
>>
>>>On July 17, 2003 at 17:35:33, Dieter Buerssner wrote:
>>>
>>[snip]
>>>>
>>>>I cannot find any randomness in the reads of lm-bench (I downloaded latest
>>>>stable source today, not the experimental version, available, too). If it would
>>>>do random reads, it would have no way to avoid the problem with the TLBs you
>>>>explained.
>>>
>>>4M pages solves it for at least 250mb worth of RAM.  But then again, _no_ chess
>>>program depends on purely random memory accesses to blow out the TLB.  The only
>>>truly random accesses I do are the regular hashing and pawn hashing, which
>>>both total to significantly less than the total nodes I search.  Which means
>>>the TLB penalty is not even 1% of my total run time.  Probably closer to
>>>.01% - .05%.
>>>
>>>I ignore that.
>>
>>Why do you think it is that low? I get ~20-30% of nodes have hash probes with
>>crafty. If you are getting 1m nodes/sec, then this is a probe every 3-5 usec.
>
>Crafty qsearch: 60% of the nodes come there. Note bob claimed less than that
>when asked after how efficient his qsearch was. But i had measured it at 60% of
>all nodes being in qsearch.
>
>40% ==> transposition table left.
>
>speed at bobs dual Xeon: 2.2 MLN a second bob claimed a while ago here.
>RAM speed 133Mhz. Random Latency to get 32 bytes at a 384MB hashtable: 500 ns
>
>Then around 96% of that you again have to write back to hashtable so that's
>another 500ns.
>
>40% x 2.2 MLN = 880KB
>
>In total 1 us.
>
>Now the luck Bob has is that you can do a read in parallel. Only when writing
>you have a problem (depends upon chipset).
>
>It is not hard to see that a big bunch of the system time goes to hashtables.
>
>Some time ago when P4 was pretty new they measured for specint how much system
>time went to crafty's hashtable actually.
>
>Note that specint uses a very SMALL hashtable. Like 2MB or so.
>
>So *trivially* the number they got there
>  a) processor is 2 times faster now and even 4 when running 2 threads.
>     we'll skip the fact that bob is even running 4 threads then.
>  b) with such a small hashtable chance is *a lot* bigger it is in the 512KB
>     cache.
>
>They got to 10% system time then for crafty losing to RAM lookups. But that of
>course also includes lookups to the slow rotated bitboards (which need like 1.5
>MB datastructure all tables together) and that includes pawnhashtable and so on.
>
>But now imagine a dual Xeon with a *way* slower bios. And fact that crafty runs
>with big hashtable in reality and not with just 1 or 2 MB hashtable like Bob
>said they use in specint.
>
>Anyway, Bob knew about this 10% number of accesses to RAM. It is really sick
>that he does as if he doesn't know it now.

What is really sick is your making numbers up.

If you _only_ talk about hash table, not pawn hash table, then your 40% is
way high.  A single probe is such a _small_ percentage of the total time
needed to search a single node, your number simply looks ridiculous.

And ridiculous it _is_.


>
>>With a very large hash table and 4K pages, the large majority of these will
>>cause a TLB miss. At 200 nsec each (a guess), this could be up to 5% of your
>>total run time.
>>
>>[snip]
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.