Computer Chess Club Archives


Search

Terms

Messages

Subject: Look at zobrist keys from information-theoretic point of view.

Author: Wylie Garvin

Date: 19:30:31 12/08/01


Hi all,

   Disclaimer, my math/logic may be broken!  I would welcome feedback on this
message.  There's been some discussion on this board of zobrist keys w.r.t.
hamming distances etc.  I want to have a look at it from a different angle.

   We have two goals for our zobrist keys:
(1) minimize the probability of a collision for two arbitrary positions, and
(2) spread the keys evenly across the keyspace (I guess this only matters for
the bottom N bits for some N=24 or something).

   NOTE: for the moment, I'm going to assume that each table entry is equally
likely to be XORed into an arbitrary zobrist key.

   Now (1) is essentially saying the key should retain as much information as
possible; in other words, the probability P(n) of bit n being a 1 in a zobrist
key should be about 0.5.  To accomplish this, we want the probability Q(e,n) to
be about 0.5 for any table entry e and bit n, since the zobrist key is computed
by XORing some number of those together; e.g. for two entries it is
Q(e1,n)*(1-Q(e2,n))+(1-Q(e1,n))*Q(e2,n) or something.  If you extend this to a
combination of more table entries and the terms are not near 0.5, I think it
quickly oscillates away from 0.5.  Notice that choosing truly random numbers for
table entries will tend to satisfy this requirement.

   But the requirement I have just stated essentially says, "the keys must be as
incompressible as possible by order-0 techniques".  But what we REALLY want is
for them to be as incompressible as possible by order-M techniques for any M <=
N (I think sqrt(N) is the most important here, can anybody who actually
understands math confirm/deny that??).  So my second requirement is basically
that, choosing any subset of k bits from a table entry, and XORing them
together, the probability of getting a 1 should again be about 0.5.  Again,
choosing truly random numbers is the simplest way I can think of to satisfy this
requirement.

   Now, if we look at the goal (2), it should be satisfied naturally as a side
effect of any methods which satisfy the goal (1) as above, *provided* all table
entries are equally likely to be combined into an arbitrary zobrist key.  If the
table entries are NOT equally likely (and they're NOT!  pawns are more likely
nearer the back ranks, several pieces are more likely near the edges or centre
of the board, there are more pawns than pieces of any kind, etc), then that
means satisfying goal (1) will probably NOT in itself satisfy goal (2).  To
modify the requirements above, I claim that a *weighted* linear combination of
the table entries is what you need to think about to get the best results with
some systematic method.  I.e. you need to actually know the probability
distribution for the table entries if you want to check the properties I
mentioned.  However with much hand-waving, I'll claim that choosing truly random
numbers tends to satisfy the requirement even though the probabilities are not
equal, which claim leads to the following brain-damaged reasoning: RANDOMLY
choosing the values of a certain bit will introduce the least possible bias over
a certain bit in all table entries w.r.t. the probability distribution of those
table entries.  I.e. if we don't take any account of the probability
distribution (and it seems that no one has mentioned it recently), then it seems
that RANDOM numbers will work at least as well as any systematic method.

So I guess what I'm recommending is, use a prng and then check that the results
are not flawed in any obvious way.  Count the number of 1's in a particular bit
position over the whole table; verify that any two bit positions are not highly
correlated.  Of course the last word can only be had by actually using the thing
and measuring collision rates.

Oh - one retrospectively obvious thing: if you use the zobrist table for your
pawn hash, then *that* probability distribution for the table entries has all
zeros for the non-pawn values.  So you could repeat your sanity checks, etc.
using just the entries for pawns (and perhaps use a model distribution).  In
fact, repeating them over randomly selected subsets of table entries doesn't
seem like a bad idea.

I guess it would be pretty easy to compute piece-on-square counts over a large
database of positions, which would make a decent model distribution for testing.

wylie



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.