Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: About random numbers and hashing

Author: David Rasmussen

Date: 12:59:29 12/07/01

Go up one level in this thread


On December 07, 2001 at 10:06:59, Sune Fischer wrote:

>On December 07, 2001 at 09:31:28, David Rasmussen wrote:
>
>>On December 07, 2001 at 08:45:47, Sune Fischer wrote:
>>
>>>On December 07, 2001 at 07:30:10, David Rasmussen wrote:
>>>
>>>>Hamming distance is not the only criterion that is important, and as you
>>>>yourself point out, you can get some really bad stuff by just going by hamming
>>>>distance (although it is unlikely. The example you give is improbable). On the
>>>>other hand, the "connection between the quality of hash codes and their Hamming
>>>>distance" is quite obvious. If you have very low hamming distance, you are much
>>>>more like to get a low-dimensioned vector space.
>>>
>>>Why is that?
>>>
>>
>>Because less bits will have to change in one vector for it to become another
>>vector (i.e. linear dependance), all things being equal. It is essentially the
>>same thing as with the hamming bound for error correcting codes.
>
>You only need to change 1 coordinate to break linear dependence.
>If you cange more you can end up where you started.
>
>>>If you create a set of vectors with mutually high hamming distances, then you
>>>are actually giving them a feature they all have in common.
>>>This is against randomness, some should have large distances and some small,
>>>this is "more random", random means random in _every_ aspect.
>>>
>>
>>As I've said before, randomness is not important. What's important is high
>>dimensionality and good spread.
>
>You have dim=64 and you have 64*64*14 random vectors, you will most likely be
>spanning the space already, that is not the problem.
>

64*64*14? You mean 64*14, I assume. Plus en passant and castling.
Spanning the space is not all you want. When you have chosen a basis, you want
the rest of your vectors to be a linear combination of as many basis vectors as
possible.

>>Randomness is just a means to that end.
>>Randomness definately gives good spread if the hash function is AND or MOD.
>>Randomness gives avg. hammings distances of 32 with 64-bit strings. Minimal
>>hamming distances are usually about 12-16 in this case. The important thing is
>>that spread isn't ruined. And it isn't in my case, since I have just found a
>>good seed for my PRNG that makes minimum hamming distance 17. The numbers are
>>still randomly distributed and as such has good spread.
>
>No, there is no proof that large hamming distances would be better for zobrist
>tables. You saw the great example:
>

I haven't said large hamming distances are good. I am saying that small hamming
distances are bad.

>11110000
>11001100
>00111100
>
>very large distances, yet they sum to 00000000.

Yes, but these are not very random. They don't have good spread. And I still
haven't said that I want as large a hamming distance as possible.

>The object is to make it least possible that this should happen for any given
>set of keys, and randomness will do this for you.
>I do not believe you can improve upon that by enforcing larger hamming
>distances, you intuition may tell you so, but intuition doens't go well with
>statistics.
>Of cause I am being very unscientific about that judgement, but so are you I
>believe ;)
>

I still haven't said that hamming distances should be as large as possible. I am
saying that if they should not be too small.

>
>>Note that I am not saying that high hamming distances are a good thing. I agree
>>that if you force your numbers to have too high hamming distances, and that is
>>your only criterion, you will probably fuck up their spread. I am just saying
>>that there _is_ a connection between hamming distances and the dimension of such
>>a vector space.
>
>Prove it ;)
>

Richard Hamming already has. Read up on hamming codes, the hamming bound and
algebraic codes in general.

>These vectors are linear independent:
>(1,0,0,0)
>(0,1,0,0)
>(0,0,1,0)
>(0,0,0,1)
>Yet they have minimal Hamming distances.
>

Yes, but you have 4 vectors in a 4 dimensional space. In our case we have a
~15*n vectors in an n-dimensional space. You can't by definition find 15*n
vectors that are linearly independant in an n-dimensional vector space. So you
can't hope for all your hashkeys to be linearly independant. And so, your
vectors can't be as simple as the ones you present.

>>>If you use 64 bit keys there is probably enough randomness left in the keys so
>>>that this is not a problem, but for 32 bit keys it could be a bigger problem.
>>>
>>
>>There is as much randomness in my keys with minimum hamming distance 17 as there
>>are in anyone who has just taken the first keys that came and has a minimum
>>hamming distance of 11. And my keys are less likely to collide.
>
>If they are less likely to collide I think you have increased randomness, so in
>essence you have improved the PRNG.

Why do you think that? Why do you insist that increased randomness is the only
way to minimize collisions?

>Of cause I will listen to a scientific/mathematical argument if you can come up
>with one :)
>

I think I have come up with plenty. Or at least as many as you.

/David



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.