Author: Wylie Garvin
Date: 19:30:31 12/08/01
Hi all, Disclaimer, my math/logic may be broken! I would welcome feedback on this message. There's been some discussion on this board of zobrist keys w.r.t. hamming distances etc. I want to have a look at it from a different angle. We have two goals for our zobrist keys: (1) minimize the probability of a collision for two arbitrary positions, and (2) spread the keys evenly across the keyspace (I guess this only matters for the bottom N bits for some N=24 or something). NOTE: for the moment, I'm going to assume that each table entry is equally likely to be XORed into an arbitrary zobrist key. Now (1) is essentially saying the key should retain as much information as possible; in other words, the probability P(n) of bit n being a 1 in a zobrist key should be about 0.5. To accomplish this, we want the probability Q(e,n) to be about 0.5 for any table entry e and bit n, since the zobrist key is computed by XORing some number of those together; e.g. for two entries it is Q(e1,n)*(1-Q(e2,n))+(1-Q(e1,n))*Q(e2,n) or something. If you extend this to a combination of more table entries and the terms are not near 0.5, I think it quickly oscillates away from 0.5. Notice that choosing truly random numbers for table entries will tend to satisfy this requirement. But the requirement I have just stated essentially says, "the keys must be as incompressible as possible by order-0 techniques". But what we REALLY want is for them to be as incompressible as possible by order-M techniques for any M <= N (I think sqrt(N) is the most important here, can anybody who actually understands math confirm/deny that??). So my second requirement is basically that, choosing any subset of k bits from a table entry, and XORing them together, the probability of getting a 1 should again be about 0.5. Again, choosing truly random numbers is the simplest way I can think of to satisfy this requirement. Now, if we look at the goal (2), it should be satisfied naturally as a side effect of any methods which satisfy the goal (1) as above, *provided* all table entries are equally likely to be combined into an arbitrary zobrist key. If the table entries are NOT equally likely (and they're NOT! pawns are more likely nearer the back ranks, several pieces are more likely near the edges or centre of the board, there are more pawns than pieces of any kind, etc), then that means satisfying goal (1) will probably NOT in itself satisfy goal (2). To modify the requirements above, I claim that a *weighted* linear combination of the table entries is what you need to think about to get the best results with some systematic method. I.e. you need to actually know the probability distribution for the table entries if you want to check the properties I mentioned. However with much hand-waving, I'll claim that choosing truly random numbers tends to satisfy the requirement even though the probabilities are not equal, which claim leads to the following brain-damaged reasoning: RANDOMLY choosing the values of a certain bit will introduce the least possible bias over a certain bit in all table entries w.r.t. the probability distribution of those table entries. I.e. if we don't take any account of the probability distribution (and it seems that no one has mentioned it recently), then it seems that RANDOM numbers will work at least as well as any systematic method. So I guess what I'm recommending is, use a prng and then check that the results are not flawed in any obvious way. Count the number of 1's in a particular bit position over the whole table; verify that any two bit positions are not highly correlated. Of course the last word can only be had by actually using the thing and measuring collision rates. Oh - one retrospectively obvious thing: if you use the zobrist table for your pawn hash, then *that* probability distribution for the table entries has all zeros for the non-pawn values. So you could repeat your sanity checks, etc. using just the entries for pawns (and perhaps use a model distribution). In fact, repeating them over randomly selected subsets of table entries doesn't seem like a bad idea. I guess it would be pretty easy to compute piece-on-square counts over a large database of positions, which would make a decent model distribution for testing. wylie
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.