Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: About random numbers and hashing

Author: David Rasmussen

Date: 06:31:28 12/07/01

Go up one level in this thread


On December 07, 2001 at 08:45:47, Sune Fischer wrote:

>On December 07, 2001 at 07:30:10, David Rasmussen wrote:
>
>>Hamming distance is not the only criterion that is important, and as you
>>yourself point out, you can get some really bad stuff by just going by hamming
>>distance (although it is unlikely. The example you give is improbable). On the
>>other hand, the "connection between the quality of hash codes and their Hamming
>>distance" is quite obvious. If you have very low hamming distance, you are much
>>more like to get a low-dimensioned vector space.
>
>Why is that?
>

Because less bits will have to change in one vector for it to become another
vector (i.e. linear dependance), all things being equal. It is essentially the
same thing as with the hamming bound for error correcting codes.

>If you create a set of vectors with mutually high hamming distances, then you
>are actually giving them a feature they all have in common.
>This is against randomness, some should have large distances and some small,
>this is "more random", random means random in _every_ aspect.
>

As I've said before, randomness is not important. What's important is high
dimensionality and good spread. Randomness is just a means to that end.
Randomness definately gives good spread if the hash function is AND or MOD.
Randomness gives avg. hammings distances of 32 with 64-bit strings. Minimal
hamming distances are usually about 12-16 in this case. The important thing is
that spread isn't ruined. And it isn't in my case, since I have just found a
good seed for my PRNG that makes minimum hamming distance 17. The numbers are
still randomly distributed and as such has good spread.

Note that I am not saying that high hamming distances are a good thing. I agree
that if you force your numbers to have too high hamming distances, and that is
your only criterion, you will probably fuck up their spread. I am just saying
that there _is_ a connection between hamming distances and the dimension of such
a vector space.

>If you use 64 bit keys there is probably enough randomness left in the keys so
>that this is not a problem, but for 32 bit keys it could be a bigger problem.
>

There is as much randomness in my keys with minimum hamming distance 17 as there
are in anyone who has just taken the first keys that came and has a minimum
hamming distance of 11. And my keys are less likely to collide.

/David



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.