Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hashtables: is larger always better?

Author: Robert Hyatt

Date: 19:35:49 09/24/01

Go up one level in this thread


On September 24, 2001 at 18:44:04, Uri Blass wrote:

>If the probability is really 1/2^64 for every new node then you are not getting
>one hash collison for every few billion nodes.

There is no way to produce perfectly random hash distributions.

Let's do this in a mathematically sound form.

First, I am going to assume that a real chess position requires 160 bits
to represent it.  This is pretty close to a real number, so let's use it as
a fact.

Second, we are hashing 160 bits of information into 64 bits.  That suggests
a _huge_ number of collisions.  In fact, for each entry in the table, there
will be 2^96 _different_ chess positions that map to the same hash
signature.

So there is no reasonable math that is going to suggest any sort of collision
rate here.

Now, on to the experiment several of us did a few years ago.  I took a bunch
of chess positions, and I did both hashing and _real_ verification.  Each hash
entry was encoded as 32 bytes (4 bits per square) so that the true board
representation was matched when the 64 bit signatures matched.  I ran this on
a _monster_ machine as well as on my office machine.  Searching about 1M nodes
per second produced less than one real collision per hour.  Here a collision is
the _real_ thing...  two positions where the 64 bit signatures match exactly,
but the 32 byte encodings were different.  Varying the hash table size didn't
have much effect.  We concluded that such a collision rate was acceptable.
Stanback got similar results.

We then did the same test using 32 bit hash signatures and found that was
totally impossible.  With multiple collisions per second (for me at 1M nodes
per second), per minute for him (at a significantly slower search speed).

Our conclusion:  64 bits is trustworthy.  32 is not.

For the record, older programs like chess 4.x and Duchess did not store just
a hash signature.  They stored a 32 byte encoding of the board, using the
Zobrist signature for the table address only.  They were afraid of the
signature-only approach.  I did a bunch of testing in 1976 to convince myself
that storing the 64 bit signature was "good enough".



>
>In this case you get one hash collision for every 2^64 nodes that is clearly
>more than few billions and simple math shows that the total number of nodes that
>deeper blue can search in 100 years  is smaller than 2^64(I assume that Deeper
>blue can search 2^30 nodes per second for this discusion).

Doesn't matter.  DB doesn't hash the last 5-7 plies + q-search.  So they won't
be stressing the 64 bit hash signature that badly...



>
>In 100 years you have 3600*24*365*100<2^32 seconds.
>
>The fact that you get practically hash collisions in rare cases prove that the
>probability is bigger than 1/2^64
>
>I do not say that the probabilities are not going to be enough for a long time
>and I tend to believe that there is not going to be a problem in the next 10
>years but I need more data to be sure about it and it is the probability to play
>a different move because of one hash collision at a random node.


Easy to test.  Run a test position for several minutes to get the right
answer.  Then re-run it but where you do the hash probe, make a percentage of
the probes behave in a random fashion.  IE the signature doesn't match, but say
it does and return EXACT.

Find out what the percentage has to be for the fake entries to influence the
root score consistently.  Then you will know how many false matches it will
require to start to actually influence the score at the root, and the game.

I did a small test like this a long while back.  I only tested a 1 in one
hundred million probe false match and found no effect I could measure at all.
So the threshold is way below one collision every one hundred million nodes to
screw things up...



>
>Maybe somebody can try to simulate hash collision by changing a score of only
>one random node in the tree and giving information for the probability to get a
>different move as a function of the size of the tree.
>
>I believe that you are going to find that a single hash collision is less
>dangerous when the size of the tree becomes bigger.

Isn't that logical?  The smaller the percentage of false matches, the less
the effect?



>
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.