Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: About random numbers and hashing

Author: David Rasmussen

Date: 04:30:10 12/07/01

On December 05, 2001 at 15:56:32, Sven Reichard wrote:

>I asked a similar question here before, however without any satisfactory result.
>The following is an argument why the Hamming distance can't be more than a crude
>heuristic, albeit possibly the best we have before. Sorry if it gets a bit
>mathematical.
>
>Assume we associate a bitstring to every piece-square combination. That is
>what's usually done in chess programs; some codes are added for the side to
>move, castling rights, e.p. squares, etc. We obtain the code of a position by
>xor-ing the codes of all the pieces contained in it.
>
>What we want to avoid is collisions at nodes close to the root. For nodes close
>to the leaves the cost of recomputing the score is smaller. Hence we want to
>avoid that
>x1^x2^...^xm = y1^y2^...^yn
>for codes xi, yi and small number m and n, and xi not equal to yj
>
>To translate that to a language that is more familiar - at least for people of a
>mathematical background - we consider the field F2 of two elements. The elements
>are 0 and 1, and we can add and multiply them as usual, with the additional rule
>that 1 + 1 = 0. This is really a field, just like the real or complex numbers,
>and we can do calculations as usual. Note that addition is just the exclusive
>or.
>
>Now the codes or bitstrings become vectors over the field F2, and the bitwise
>exclusive or becomes componentwise addition, i.e., usual addition of vectors.
>All these vectors form the vector space F2^k, where k is the length of the
>vectors. Typically, k = 64.
>
>So, what we want to avoid is an equation
>x1 + x2 + ... + xm = y1 + y2 + ... + yn
>or
>x1 + x2 + ... + xm + y1 + y2 + ... + yn = 0
>since in F2, subtraction is the same as addition. Remembering some linear
>algebra, this just means that we want the set x1,...,xm,y1,...,yn to be linearly
>independent.
>

Yep. I posted similar analysis once.

>This leads to the following criterion for picking a set of hashcodes:
>A set of vectors in F2^k is a good set of hash codes if each small subset of
>non-zero vectors is linearly independent.
>

That is, you want the dimension of this vector space to be as big as possible.
This is the subject of algebraic codes. But that's not the only important
criterion. We also want the keys to spread equally, which implies randomness.

>What is not clear here is the meaning of "small", but we want small to be as big
>as possible. In other words, we consider sets of size up to a certain size as
>small, and if we can make that size bigger, it is better, since this leads to
>unique codes deeper in the tree.
>
>However what is clear is that this quality criterion does not depend on the base
>of the vector space. I.e., if we have a good set and multiply each vector by an
>invertible matrix (in other words, if we rotate the vectors), the obtained set
>will be just as good, since the rotation does not change the linear
>independence.
>

Why is dependance of the basis important? It is not.

>The Hamming distance, on the other hand, is highly dependent on the vector space
>base. Take for example the vectors (1,0) and (0,1) in F2^2; they have Hamming
>distance 2. If we multiply both of them by
>(1 1)
>(0 1)
>we get (1,1) and (0,1), which have Hamming distance 1. Actually we can change
>any distance to anything else (except for 0) by an appropriate matrix.
>

So?

>Thus we try to approximate something that is independent from the base (the
>quality of our hash codes) by something that depends on it (the Hamming
>distance). Simple logic tells you that this approximation has to be real bad.
>

Not my logic :)

>An example where it doesn't work: It has been said that the Hamming distance
>shouldn't be to small or to  big. So, vectors at a distance which is half the
>length should be ok, right? Let the length be 8 (I don't want to type too many
>0's and 1's), and consider the vectors
>11110000
>11001100
>00111100
>They all have weight 4, their pairwise distance is 4, and yet they add up to 0.
>Just by looking at Hamming distances, you have no chance of detecting that.
>

Sure, but that's a contrived example.

>Summarizing I can say that I see no connection between the quality of hash codes
>and their Hamming distance.

Hamming distance is not the only criterion that is important, and as you
yourself point out, you can get some really bad stuff by just going by hamming
distance (although it is unlikely. The example you give is improbable). On the
other hand, the "connection between the quality of hash codes and their Hamming
distance" is quite obvious. If you have very low hamming distance, you are much
more like to get a low-dimensioned vector space. Basic algebraic coding theory,
and also very obvious. If two strings differ only by a few bits, then only a few
xors might transform one into the other, all things being equal. But as I said,
the hamming distance criterion is not enough in itself. At least not to desribe
a good vector space (or algebraic code) for this purpose. It might be good
enough for practical use.

>Using a good RNG like the one provided in GNU's
>stdlib will yield good hash codes ( you can actually prove that), and so I will
>take the codes as they are supplied by rand() or random() without messing with
>them and thereby most likely make them worse.
>

random() is not portable. And even if rand() was good, it isn't portable either,
in the sense that if you want to compile your program on another compiler (gcc
doesn't generate very fast code), you can't be sure that you have an equally
good PRNG, and also, you cannot be sure that the actual numbers are the same
(for the same reason), so you can't use hashkeys in opening books etc. And
besides, gcc's rand() isn't all that great. That's why they have random(). It's
better to roll your own.

/David

>Sorry for the length of this message and my rambling on. Thanks for your
>patience and any feedback. Actually I bet a case of beer that nobody reads up to
>this point (just kidding).
>
>Later,
>Sven.

Re: About random numbers and hashing Sune Fischer 05:45:47 12/07/01
- Re: About random numbers and hashing David Rasmussen 06:31:28 12/07/01
  - Re: About random numbers and hashing Sune Fischer 07:06:59 12/07/01
    - Re: About random numbers and hashing David Rasmussen 12:59:29 12/07/01
      - Re: About random numbers and hashing Sune Fischer 14:34:37 12/07/01
        
        Re: About random numbers and hashing Sven Reichard 09:25:59 12/08/01
        
        Re: About random numbers and hashing David Rasmussen 03:57:28 12/09/01
        
        Re: About random numbers and hashing Sven Reichard 06:40:40 12/10/01
        
        Re: About random numbers and hashing Sune Fischer 04:31:48 12/09/01
        
        Re: About random numbers and hashing David Rasmussen 06:14:10 12/09/01
        
        Re: About random numbers and hashing Ralf Elvsén 15:28:58 12/08/01
        
        Re: About random numbers and hashing Sune Fischer 15:50:45 12/08/01
    - Re: About random numbers and hashing Sune Fischer 07:29:19 12/07/01

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.