Author: David Rasmussen
Date: 12:59:29 12/07/01
Go up one level in this thread
On December 07, 2001 at 10:06:59, Sune Fischer wrote: >On December 07, 2001 at 09:31:28, David Rasmussen wrote: > >>On December 07, 2001 at 08:45:47, Sune Fischer wrote: >> >>>On December 07, 2001 at 07:30:10, David Rasmussen wrote: >>> >>>>Hamming distance is not the only criterion that is important, and as you >>>>yourself point out, you can get some really bad stuff by just going by hamming >>>>distance (although it is unlikely. The example you give is improbable). On the >>>>other hand, the "connection between the quality of hash codes and their Hamming >>>>distance" is quite obvious. If you have very low hamming distance, you are much >>>>more like to get a low-dimensioned vector space. >>> >>>Why is that? >>> >> >>Because less bits will have to change in one vector for it to become another >>vector (i.e. linear dependance), all things being equal. It is essentially the >>same thing as with the hamming bound for error correcting codes. > >You only need to change 1 coordinate to break linear dependence. >If you cange more you can end up where you started. > >>>If you create a set of vectors with mutually high hamming distances, then you >>>are actually giving them a feature they all have in common. >>>This is against randomness, some should have large distances and some small, >>>this is "more random", random means random in _every_ aspect. >>> >> >>As I've said before, randomness is not important. What's important is high >>dimensionality and good spread. > >You have dim=64 and you have 64*64*14 random vectors, you will most likely be >spanning the space already, that is not the problem. > 64*64*14? You mean 64*14, I assume. Plus en passant and castling. Spanning the space is not all you want. When you have chosen a basis, you want the rest of your vectors to be a linear combination of as many basis vectors as possible. >>Randomness is just a means to that end. >>Randomness definately gives good spread if the hash function is AND or MOD. >>Randomness gives avg. hammings distances of 32 with 64-bit strings. Minimal >>hamming distances are usually about 12-16 in this case. The important thing is >>that spread isn't ruined. And it isn't in my case, since I have just found a >>good seed for my PRNG that makes minimum hamming distance 17. The numbers are >>still randomly distributed and as such has good spread. > >No, there is no proof that large hamming distances would be better for zobrist >tables. You saw the great example: > I haven't said large hamming distances are good. I am saying that small hamming distances are bad. >11110000 >11001100 >00111100 > >very large distances, yet they sum to 00000000. Yes, but these are not very random. They don't have good spread. And I still haven't said that I want as large a hamming distance as possible. >The object is to make it least possible that this should happen for any given >set of keys, and randomness will do this for you. >I do not believe you can improve upon that by enforcing larger hamming >distances, you intuition may tell you so, but intuition doens't go well with >statistics. >Of cause I am being very unscientific about that judgement, but so are you I >believe ;) > I still haven't said that hamming distances should be as large as possible. I am saying that if they should not be too small. > >>Note that I am not saying that high hamming distances are a good thing. I agree >>that if you force your numbers to have too high hamming distances, and that is >>your only criterion, you will probably fuck up their spread. I am just saying >>that there _is_ a connection between hamming distances and the dimension of such >>a vector space. > >Prove it ;) > Richard Hamming already has. Read up on hamming codes, the hamming bound and algebraic codes in general. >These vectors are linear independent: >(1,0,0,0) >(0,1,0,0) >(0,0,1,0) >(0,0,0,1) >Yet they have minimal Hamming distances. > Yes, but you have 4 vectors in a 4 dimensional space. In our case we have a ~15*n vectors in an n-dimensional space. You can't by definition find 15*n vectors that are linearly independant in an n-dimensional vector space. So you can't hope for all your hashkeys to be linearly independant. And so, your vectors can't be as simple as the ones you present. >>>If you use 64 bit keys there is probably enough randomness left in the keys so >>>that this is not a problem, but for 32 bit keys it could be a bigger problem. >>> >> >>There is as much randomness in my keys with minimum hamming distance 17 as there >>are in anyone who has just taken the first keys that came and has a minimum >>hamming distance of 11. And my keys are less likely to collide. > >If they are less likely to collide I think you have increased randomness, so in >essence you have improved the PRNG. Why do you think that? Why do you insist that increased randomness is the only way to minimize collisions? >Of cause I will listen to a scientific/mathematical argument if you can come up >with one :) > I think I have come up with plenty. Or at least as many as you. /David
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.