Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: About random numbers and hashing

Author: Sven Reichard

Date: 12:56:32 12/05/01

Go up one level in this thread


I asked a similar question here before, however without any satisfactory result.
The following is an argument why the Hamming distance can't be more than a crude
heuristic, albeit possibly the best we have before. Sorry if it gets a bit
mathematical.

Assume we associate a bitstring to every piece-square combination. That is
what's usually done in chess programs; some codes are added for the side to
move, castling rights, e.p. squares, etc. We obtain the code of a position by
xor-ing the codes of all the pieces contained in it.

What we want to avoid is collisions at nodes close to the root. For nodes close
to the leaves the cost of recomputing the score is smaller. Hence we want to
avoid that
x1^x2^...^xm = y1^y2^...^yn
for codes xi, yi and small number m and n, and xi not equal to yj

To translate that to a language that is more familiar - at least for people of a
mathematical background - we consider the field F2 of two elements. The elements
are 0 and 1, and we can add and multiply them as usual, with the additional rule
that 1 + 1 = 0. This is really a field, just like the real or complex numbers,
and we can do calculations as usual. Note that addition is just the exclusive
or.

Now the codes or bitstrings become vectors over the field F2, and the bitwise
exclusive or becomes componentwise addition, i.e., usual addition of vectors.
All these vectors form the vector space F2^k, where k is the length of the
vectors. Typically, k = 64.

So, what we want to avoid is an equation
x1 + x2 + ... + xm = y1 + y2 + ... + yn
or
x1 + x2 + ... + xm + y1 + y2 + ... + yn = 0
since in F2, subtraction is the same as addition. Remembering some linear
algebra, this just means that we want the set x1,...,xm,y1,...,yn to be linearly
independent.

This leads to the following criterion for picking a set of hashcodes:
A set of vectors in F2^k is a good set of hash codes if each small subset of
non-zero vectors is linearly independent.

What is not clear here is the meaning of "small", but we want small to be as big
as possible. In other words, we consider sets of size up to a certain size as
small, and if we can make that size bigger, it is better, since this leads to
unique codes deeper in the tree.

However what is clear is that this quality criterion does not depend on the base
of the vector space. I.e., if we have a good set and multiply each vector by an
invertible matrix (in other words, if we rotate the vectors), the obtained set
will be just as good, since the rotation does not change the linear
independence.

The Hamming distance, on the other hand, is highly dependent on the vector space
base. Take for example the vectors (1,0) and (0,1) in F2^2; they have Hamming
distance 2. If we multiply both of them by
(1 1)
(0 1)
we get (1,1) and (0,1), which have Hamming distance 1. Actually we can change
any distance to anything else (except for 0) by an appropriate matrix.

Thus we try to approximate something that is independent from the base (the
quality of our hash codes) by something that depends on it (the Hamming
distance). Simple logic tells you that this approximation has to be real bad.

An example where it doesn't work: It has been said that the Hamming distance
shouldn't be to small or to  big. So, vectors at a distance which is half the
length should be ok, right? Let the length be 8 (I don't want to type too many
0's and 1's), and consider the vectors
11110000
11001100
00111100
They all have weight 4, their pairwise distance is 4, and yet they add up to 0.
Just by looking at Hamming distances, you have no chance of detecting that.

Summarizing I can say that I see no connection between the quality of hash codes
and their Hamming distance. Using a good RNG like the one provided in GNU's
stdlib will yield good hash codes ( you can actually prove that), and so I will
take the codes as they are supplied by rand() or random() without messing with
them and thereby most likely make them worse.

Sorry for the length of this message and my rambling on. Thanks for your
patience and any feedback. Actually I bet a case of beer that nobody reads up to
this point (just kidding).

Later,
Sven.




This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.