Author: Robert Hyatt
Date: 11:52:10 07/20/03
Go up one level in this thread
On July 19, 2003 at 00:36:26, Peter McKenzie wrote: >On July 18, 2003 at 23:45:16, Robert Hyatt wrote: > >>On July 18, 2003 at 21:58:18, J. Wesley Cleveland wrote: >> >>>On July 18, 2003 at 21:17:14, Robert Hyatt wrote: >>> >>>>On July 18, 2003 at 15:21:35, J. Wesley Cleveland wrote: >>>> >>>>>On July 17, 2003 at 18:25:51, Robert Hyatt wrote: >>>>> >>>>>>On July 17, 2003 at 17:35:33, Dieter Buerssner wrote: >>>>>> >>>>>[snip] >>>>>>> >>>>>>>I cannot find any randomness in the reads of lm-bench (I downloaded latest >>>>>>>stable source today, not the experimental version, available, too). If it would >>>>>>>do random reads, it would have no way to avoid the problem with the TLBs you >>>>>>>explained. >>>>>> >>>>>>4M pages solves it for at least 250mb worth of RAM. But then again, _no_ chess >>>>>>program depends on purely random memory accesses to blow out the TLB. The only >>>>>>truly random accesses I do are the regular hashing and pawn hashing, which >>>>>>both total to significantly less than the total nodes I search. Which means >>>>>>the TLB penalty is not even 1% of my total run time. Probably closer to >>>>>>.01% - .05%. >>>>>> >>>>>>I ignore that. >>>>> >>>>>Why do you think it is that low? I get ~20-30% of nodes have hash probes with >>>>>crafty. >>>> >>>> >>>>Look at the code. >>>I not only looked at the code. I *instrumented it*. I won't have complete >>>results until Monday, but it appears that crafty spends 3-5% of its total time >>>inside hashprobe on my (slow) machine and a prefetch could reduce that by about >>>half. >>> >>>>Crafty probes memory _once_ for a hash probe. That >>>>introduces a memory access penalty once per node in the basic search, >>>>less than once per node in the q-search (I only probe phash there and I >>>>don't probe it but about 25% of the q-search nodes I visit). >>> >>>If you had read whai I wrote, you would see I said crafty does a hash probe >>>20-30% of its total nodes. >> >>OK. I clearly mis-read what you meant. the 20-30% was eye-catching as that >>is a pretty common hash hit percentage as well... >> >> >>> >>>>As a result, you get less than one probe per node searched. A node searched >>>>requires something on the order of 3000-5000 instructions. What percentage >>>>of that 3K-5K instruction timing is that single hash probe? Almost zero. >>> >>>Except that a fast machine may do these 3-5K instructions in <1usec. A cache >>>miss + a TLB miss may take 300-400 ns. I would not call 30% almost 0. >> >>You are missing my point. In the position(s) you tested, you saw 20-30% > >Bob, there is really no need for the somewhat hostile 'You are missing my >point', especially as its clear you have been somewhat careless in paying >attention to Mr Cleveland's politely made points. I don't see _anything_ hostile in "you are missing my point." I didn't say "you are too dumb to understand my point" or anything similar. "you are missing my point" could mean (a) you overlooked something; (b) you misunderstood something; (c) I didn't explain it clearly; (d) any of another dozen things. If that sounds "hostile" it wasn't intentional. However, it doesn't sound the least bit hostile to me. > >I understand that Vincent has gotten your back up on this issue, but if you can >forget all the crap for a minute I think there might be something of value in >all this stuff. > >Its just possible that your statement of: >"Which means the TLB penalty is not even 1% of my total run time. Probably >closer to .01% - .05%." >is a bit off the mark. No shame if it is. > >Here's hoping we can find out in a scientific manner. I've measured this in the past. In fact, if you look back thru the archive, you'll note that when I changed from the two-table approach to a single probe to get two entries (exactly the same approach, but with one less random memory access) the improvement was minimal. In the 1% or less range if I recall correctly. I don't think a _single_ memory read is going to affect a thing. > >cheers, >Peter > > >>hash probes. That means one probe for every 3-5 nodes. At 1M nodes >>per second, that is 200K-300K probes per second. If you measure the >>time spent in searching a single node, multiply that by 3-5X, then compare >>that to the hash probe time, the time spent probing the hash table is low. >> >>Note that your 5% is _not_ the total time used to probe the table. It is >>the time to probe the table, and do it _twice_ although the second probe >>doesn't have any memory access penalty associated with it in most cases. >> >>So a big percent of that 5% is doing the actual work done in HashProbe(), >>rather than being all memory access penalty... >> >> >> >> >> >> >>>> >>>>Ignore hits and misses, that is not the issue here. The issue is the cost of >>>>doing the probe itself, which is essentially zero. >>>> >>>> >>>> >>>> >>>>>If you are getting 1m nodes/sec, then this is a probe every 3-5 usec. >>>>>With a very large hash table and 4K pages, the large majority of these will >>>>>cause a TLB miss. At 200 nsec each (a guess), this could be up to 5% of your >>>>>total run time. >>>> >>>>See above. I don't really probe once for every node. >>> >>>See above. I never said you did.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.