Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Data on hash table probing and memory bandwidth

Author: Robert Hyatt

Date: 09:02:40 12/16/02

Go up one level in this thread



After some off-the-wall comments by Vincent, I decided to resurrect a small
project
I had put aside a while back.

As most know, I use the "belle hash approach".  Two tables, one depth-preferred
where the
entry with the deepest "draft" is kept and the other "always store" where we
store there if
we can't store in the first due to insufficient draft.  This second table is
also 2X the size of
the first.

The problem with the two-table approach is that the two entries are definitely
going to end
up in two separate cache lines, because the two table entries are not
contiguous.  Vincent
claimed this was a huge penalty.

Here's the facts:

First, I had previously modified my two-table approach to combine things into
one big
table.  Each "entry" now has three entries.  The first entry is the same
depth-preferred
entry as previously used.  The other two are "always store" and use a single bit
in the
hash signature to choose which one is used.  This means that the two algorithms
use
_exactly_ the same number of entries, and that the entries are probed in
_exactly_ the
same way.  Except that now, the three entries are 48 consecutive bytes in
memory, and
this means that I am now going to average 1.5 cache misses rather than 2.0 as
with the
old approach.  1/2 of the time the depth-preferred entry will be the first or
second
sixteen bytes in a 64 byte L1 cache line.  This will cause the two always-store
entries to
be pre-fetched since they will be part of the same cache line.  the other 1/2 of
the time,
the  depth-preferred entry will be in bytes 32-47 or 48-63.  If the latter, then
the always-
store entry will be a cache miss 100% of the time, and for the 32-47 byte case,
1/2 of the
time the needed always_store entry will be a hit, and 1/2 of the time it will be
a miss.

Sounds good.

I ran the test.

I will give a few NPS values for comparison:
(in each case, the first nps is from the _old_ approach, the second
nps is from the new (supposedly faster) approach.)

nps=911k        nps=914k
nps=1400k      nps=1425k
nps=1325k      nps=1353k

I think the conclusion should be pretty clear.  There is barely enough
difference to
measure, basically.  going from 1325 to 1353 is a 2% speed increase.  The point
being
that as I have said _many_ times, the time spent in probing the hash table is
not a
significant issue.  Vincent claims otherwise.  The two-table (separate tables)
is hardly
any less efficient than one combined table.  This is the same sort of conclusion
I reached
in the famous "your smp-lock approach sucks" argument.  It is better to _test_
rather than
to _guess_.




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.