Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Another memory latency test

Author: Vincent Diepeveen

Date: 14:27:21 07/18/03

Go up one level in this thread


On July 18, 2003 at 15:21:35, J. Wesley Cleveland wrote:

>On July 17, 2003 at 18:25:51, Robert Hyatt wrote:
>
>>On July 17, 2003 at 17:35:33, Dieter Buerssner wrote:
>>
>[snip]
>>>
>>>I cannot find any randomness in the reads of lm-bench (I downloaded latest
>>>stable source today, not the experimental version, available, too). If it would
>>>do random reads, it would have no way to avoid the problem with the TLBs you
>>>explained.
>>
>>4M pages solves it for at least 250mb worth of RAM.  But then again, _no_ chess
>>program depends on purely random memory accesses to blow out the TLB.  The only
>>truly random accesses I do are the regular hashing and pawn hashing, which
>>both total to significantly less than the total nodes I search.  Which means
>>the TLB penalty is not even 1% of my total run time.  Probably closer to
>>.01% - .05%.
>>
>>I ignore that.
>
>Why do you think it is that low? I get ~20-30% of nodes have hash probes with
>crafty. If you are getting 1m nodes/sec, then this is a probe every 3-5 usec.

Crafty qsearch: 60% of the nodes come there. Note bob claimed less than that
when asked after how efficient his qsearch was. But i had measured it at 60% of
all nodes being in qsearch.

40% ==> transposition table left.

speed at bobs dual Xeon: 2.2 MLN a second bob claimed a while ago here.
RAM speed 133Mhz. Random Latency to get 32 bytes at a 384MB hashtable: 500 ns

Then around 96% of that you again have to write back to hashtable so that's
another 500ns.

40% x 2.2 MLN = 880KB

In total 1 us.

Now the luck Bob has is that you can do a read in parallel. Only when writing
you have a problem (depends upon chipset).

It is not hard to see that a big bunch of the system time goes to hashtables.

Some time ago when P4 was pretty new they measured for specint how much system
time went to crafty's hashtable actually.

Note that specint uses a very SMALL hashtable. Like 2MB or so.

So *trivially* the number they got there
  a) processor is 2 times faster now and even 4 when running 2 threads.
     we'll skip the fact that bob is even running 4 threads then.
  b) with such a small hashtable chance is *a lot* bigger it is in the 512KB
     cache.

They got to 10% system time then for crafty losing to RAM lookups. But that of
course also includes lookups to the slow rotated bitboards (which need like 1.5
MB datastructure all tables together) and that includes pawnhashtable and so on.

But now imagine a dual Xeon with a *way* slower bios. And fact that crafty runs
with big hashtable in reality and not with just 1 or 2 MB hashtable like Bob
said they use in specint.

Anyway, Bob knew about this 10% number of accesses to RAM. It is really sick
that he does as if he doesn't know it now.

>With a very large hash table and 4K pages, the large majority of these will
>cause a TLB miss. At 200 nsec each (a guess), this could be up to 5% of your
>total run time.
>
>[snip]



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.