Author: Robert Hyatt
Date: 18:20:00 07/18/03
Go up one level in this thread
On July 18, 2003 at 17:27:21, Vincent Diepeveen wrote: >On July 18, 2003 at 15:21:35, J. Wesley Cleveland wrote: > >>On July 17, 2003 at 18:25:51, Robert Hyatt wrote: >> >>>On July 17, 2003 at 17:35:33, Dieter Buerssner wrote: >>> >>[snip] >>>> >>>>I cannot find any randomness in the reads of lm-bench (I downloaded latest >>>>stable source today, not the experimental version, available, too). If it would >>>>do random reads, it would have no way to avoid the problem with the TLBs you >>>>explained. >>> >>>4M pages solves it for at least 250mb worth of RAM. But then again, _no_ chess >>>program depends on purely random memory accesses to blow out the TLB. The only >>>truly random accesses I do are the regular hashing and pawn hashing, which >>>both total to significantly less than the total nodes I search. Which means >>>the TLB penalty is not even 1% of my total run time. Probably closer to >>>.01% - .05%. >>> >>>I ignore that. >> >>Why do you think it is that low? I get ~20-30% of nodes have hash probes with >>crafty. If you are getting 1m nodes/sec, then this is a probe every 3-5 usec. > >Crafty qsearch: 60% of the nodes come there. Note bob claimed less than that >when asked after how efficient his qsearch was. But i had measured it at 60% of >all nodes being in qsearch. > >40% ==> transposition table left. > >speed at bobs dual Xeon: 2.2 MLN a second bob claimed a while ago here. >RAM speed 133Mhz. Random Latency to get 32 bytes at a 384MB hashtable: 500 ns > >Then around 96% of that you again have to write back to hashtable so that's >another 500ns. > >40% x 2.2 MLN = 880KB > >In total 1 us. > >Now the luck Bob has is that you can do a read in parallel. Only when writing >you have a problem (depends upon chipset). > >It is not hard to see that a big bunch of the system time goes to hashtables. > >Some time ago when P4 was pretty new they measured for specint how much system >time went to crafty's hashtable actually. > >Note that specint uses a very SMALL hashtable. Like 2MB or so. > >So *trivially* the number they got there > a) processor is 2 times faster now and even 4 when running 2 threads. > we'll skip the fact that bob is even running 4 threads then. > b) with such a small hashtable chance is *a lot* bigger it is in the 512KB > cache. > >They got to 10% system time then for crafty losing to RAM lookups. But that of >course also includes lookups to the slow rotated bitboards (which need like 1.5 >MB datastructure all tables together) and that includes pawnhashtable and so on. > >But now imagine a dual Xeon with a *way* slower bios. And fact that crafty runs >with big hashtable in reality and not with just 1 or 2 MB hashtable like Bob >said they use in specint. > >Anyway, Bob knew about this 10% number of accesses to RAM. It is really sick >that he does as if he doesn't know it now. What is really sick is your making numbers up. If you _only_ talk about hash table, not pawn hash table, then your 40% is way high. A single probe is such a _small_ percentage of the total time needed to search a single node, your number simply looks ridiculous. And ridiculous it _is_. > >>With a very large hash table and 4K pages, the large majority of these will >>cause a TLB miss. At 200 nsec each (a guess), this could be up to 5% of your >>total run time. >> >>[snip]
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.