Author: Vincent Diepeveen
Date: 06:30:41 12/10/01
Go up one level in this thread
On December 08, 2001 at 22:30:31, Wylie Garvin wrote: >Hi all, > > Disclaimer, my math/logic may be broken! I would welcome feedback on this >message. There's been some discussion on this board of zobrist keys w.r.t. >hamming distances etc. I want to have a look at it from a different angle. > > We have two goals for our zobrist keys: >(1) minimize the probability of a collision for two arbitrary positions, and Your goal should read that you want to minimize probability of collision of to 2 positions which are possible to reach from the same search tree. The real difference is that when hashing 10^40 positions to 2^64 = = 18.446.744.073.709.551.616 = 0.18 ^ 20 positions, that theoretical you'll get loads of collisions, whereas in our model we search at most a few billion positions, which is are 10^9 positions. Clearly a key for 10^20 positions can theoretical hash 10^9 positions without collisions. So here there is a big difference. >(2) spread the keys evenly across the keyspace (I guess this only matters for >the bottom N bits for some N=24 or something). > > NOTE: for the moment, I'm going to assume that each table entry is equally >likely to be XORed into an arbitrary zobrist key. This assumption is wrong. The indexing scheme already causes chaining in the hashtable. > Now (1) is essentially saying the key should retain as much information as >possible; in other words, the probability P(n) of bit n being a 1 in a zobrist >key should be about 0.5. That's kind of a definition how you want random numbers to be. >To accomplish this, we want the probability Q(e,n) to >be about 0.5 for any table entry e and bit n, since the zobrist key is computed >by XORing some number of those together; e.g. for two entries it is >Q(e1,n)*(1-Q(e2,n))+(1-Q(e1,n))*Q(e2,n) or something. If you extend this to a >combination of more table entries and the terms are not near 0.5, I think it >quickly oscillates away from 0.5. Notice that choosing truly random numbers for >table entries will tend to satisfy this requirement. > > But the requirement I have just stated essentially says, "the keys must be as >incompressible as possible by order-0 techniques". But what we REALLY want is >for them to be as incompressible as possible by order-M techniques for any M <= >N (I think sqrt(N) is the most important here, can anybody who actually >understands math confirm/deny that??). So my second requirement is basically >that, choosing any subset of k bits from a table entry, and XORing them >together, the probability of getting a 1 should again be about 0.5. Again, >choosing truly random numbers is the simplest way I can think of to satisfy this >requirement. Exactly. > Now, if we look at the goal (2), it should be satisfied naturally as a side >effect of any methods which satisfy the goal (1) as above, *provided* all table >entries are equally likely to be combined into an arbitrary zobrist key. If the >table entries are NOT equally likely (and they're NOT! pawns are more likely >nearer the back ranks, several pieces are more likely near the edges or centre >of the board, there are more pawns than pieces of any kind, etc), then that >means satisfying goal (1) will probably NOT in itself satisfy goal (2). To >modify the requirements above, I claim that a *weighted* linear combination of >the table entries is what you need to think about to get the best results with >some systematic method. I.e. you need to actually know the probability >distribution for the table entries if you want to check the properties I >mentioned. However with much hand-waving, I'll claim that choosing truly random >numbers tends to satisfy the requirement even though the probabilities are not >equal, which claim leads to the following brain-damaged reasoning: RANDOMLY >choosing the values of a certain bit will introduce the least possible bias over >a certain bit in all table entries w.r.t. the probability distribution of those >table entries. I.e. if we don't take any account of the probability >distribution (and it seems that no one has mentioned it recently), then it seems >that RANDOM numbers will work at least as well as any systematic method. > >So I guess what I'm recommending is, use a prng and then check that the results >are not flawed in any obvious way. Count the number of 1's in a particular bit >position over the whole table; verify that any two bit positions are not highly >correlated. Of course the last word can only be had by actually using the thing >and measuring collision rates. > >Oh - one retrospectively obvious thing: if you use the zobrist table for your >pawn hash, then *that* probability distribution for the table entries has all >zeros for the non-pawn values. So you could repeat your sanity checks, etc. >using just the entries for pawns (and perhaps use a model distribution). In >fact, repeating them over randomly selected subsets of table entries doesn't >seem like a bad idea. > >I guess it would be pretty easy to compute piece-on-square counts over a large >database of positions, which would make a decent model distribution for testing. > >wylie
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.