Author: Sune Fischer
Date: 07:06:59 12/07/01
Go up one level in this thread
On December 07, 2001 at 09:31:28, David Rasmussen wrote: >On December 07, 2001 at 08:45:47, Sune Fischer wrote: > >>On December 07, 2001 at 07:30:10, David Rasmussen wrote: >> >>>Hamming distance is not the only criterion that is important, and as you >>>yourself point out, you can get some really bad stuff by just going by hamming >>>distance (although it is unlikely. The example you give is improbable). On the >>>other hand, the "connection between the quality of hash codes and their Hamming >>>distance" is quite obvious. If you have very low hamming distance, you are much >>>more like to get a low-dimensioned vector space. >> >>Why is that? >> > >Because less bits will have to change in one vector for it to become another >vector (i.e. linear dependance), all things being equal. It is essentially the >same thing as with the hamming bound for error correcting codes. You only need to change 1 coordinate to break linear dependence. If you cange more you can end up where you started. >>If you create a set of vectors with mutually high hamming distances, then you >>are actually giving them a feature they all have in common. >>This is against randomness, some should have large distances and some small, >>this is "more random", random means random in _every_ aspect. >> > >As I've said before, randomness is not important. What's important is high >dimensionality and good spread. You have dim=64 and you have 64*64*14 random vectors, you will most likely be spanning the space already, that is not the problem. >Randomness is just a means to that end. >Randomness definately gives good spread if the hash function is AND or MOD. >Randomness gives avg. hammings distances of 32 with 64-bit strings. Minimal >hamming distances are usually about 12-16 in this case. The important thing is >that spread isn't ruined. And it isn't in my case, since I have just found a >good seed for my PRNG that makes minimum hamming distance 17. The numbers are >still randomly distributed and as such has good spread. No, there is no proof that large hamming distances would be better for zobrist tables. You saw the great example: 11110000 11001100 00111100 very large distances, yet they sum to 00000000. The object is to make it least possible that this should happen for any given set of keys, and randomness will do this for you. I do not believe you can improve upon that by enforcing larger hamming distances, you intuition may tell you so, but intuition doens't go well with statistics. Of cause I am being very unscientific about that judgement, but so are you I believe ;) >Note that I am not saying that high hamming distances are a good thing. I agree >that if you force your numbers to have too high hamming distances, and that is >your only criterion, you will probably fuck up their spread. I am just saying >that there _is_ a connection between hamming distances and the dimension of such >a vector space. Prove it ;) These vectors are linear independent: (1,0,0,0) (0,1,0,0) (0,0,1,0) (0,0,0,1) Yet they have minimal Hamming distances. >>If you use 64 bit keys there is probably enough randomness left in the keys so >>that this is not a problem, but for 32 bit keys it could be a bigger problem. >> > >There is as much randomness in my keys with minimum hamming distance 17 as there >are in anyone who has just taken the first keys that came and has a minimum >hamming distance of 11. And my keys are less likely to collide. If they are less likely to collide I think you have increased randomness, so in essence you have improved the PRNG. Of cause I will listen to a scientific/mathematical argument if you can come up with one :) >/David -S.
This page took 0.02 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.