Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hamming distance and lower hash table indexing

Author: Tom Likens

Date: 11:33:18 09/03/03

Go up one level in this thread


On September 03, 2003 at 14:09:18, Frank Phillips wrote:

>On September 03, 2003 at 13:43:13, Tom Likens wrote:
>
>>On September 03, 2003 at 13:28:20, Frank Phillips wrote:
>>
>>>On September 03, 2003 at 10:52:35, Tom Likens wrote:
>>>
>>>>On September 03, 2003 at 02:36:33, Tony Werten wrote:
>>>>
>>>>>On September 02, 2003 at 13:00:05, Tom Likens wrote:
>>>>>
>>>>>>
>>>>>>This is a general query about an issue I've run into and
>>>>>>I'm wondering if anyone else has dealt with it or if I'm
>>>>>>just off base.  Essentially, the issue is this- recently
>>>>>>I started playing around with my hash table's random
>>>>>>numbers to see if I could improve them.  Currently, for
>>>>>>the main table I have a Hamming distance of 24 for
>>>>>>roughly 800 random 64-bit values.
>>>>>>
>>>>>>This is where my inquiry comes in, I use the lower N
>>>>>>bits of my hash key as an index into the table.  I'm
>>>>>>wondering, even though the overall Hamming distance is
>>>>>>24 shouldn't I be concerned about the Hamming distance
>>>>>>of the lower N bits?
>>>>>
>>>>>It could be, but it's unlikely it would give problems.
>>>>>
>>>>>To make your keys better, you should not strive for the biggest average hamming
>>>>>distance, but for the biggest minimum hamming distance.
>>>>>
>>>>>22 is doable, wich should give you a collision every never.
>>>>
>>>>Tony,
>>>>
>>>>I was a little imprecise, my *minimum* Hamming distance is 24,
>>>>the average is closer to 32.  I don't generate my random numbers
>>>>on the fly, but instead have a separate program that creates
>>>>the numbers and saves them to a static array that becomes part of
>>>>the program proper.
>>>>
>>>>Last night I changed this program slightly to give me a minimum
>>>>Hamming distance of 10 on the lower 32-bits (I tried 12 initially
>>>>but killed it after four hours of run time, without any results).
>>>>It also verified that the overall minimum distance for the 64-bit
>>>>values was still 24.
>>>>
>>>>Anyway, long story short, my collision rate in the repetition
>>>>hash table went down significantly.  I intend to run a final
>>>>experiment tonight to actually measure the collisions for
>>>>different 32-bit distances (there has to be a graph in here
>>>>somewhere ;)
>>>>
>>>>regards,
>>>>--tom
>>>>
>>>>>
>>>>>Of coarse you still have the risk of the lower part being worse than the upper
>>>>>part, but you can just let your computer search a bit longer for a minimum
>>>>>hamming distance of 11 in the lower part.
>>>>>
>>>>>Tony
>>>>>
>>>>>>If these bits are alike, even
>>>>>>though the overal value is reasonable don't it increase
>>>>>>the probablity of hash collisions considerably?
>>>>>>Of course, I won't get a false match since I still use
>>>>>>all 64-bits of the key to indicate if the hash entry
>>>>>>is valid, but it's time wasted performing multiple
>>>>>>probes into the table.
>>>>>>
>>>>>>I'm also guessing that this could be more of an issue
>>>>>>for the repetition hash table, since it is quite a bit
>>>>>>smaller than the main table.  Currently, I don't do
>>>>>>multiple probes into this table and I've never seen
>>>>>>an issue.  Still, I'm starting to wonder if there is
>>>>>>a problem lurking below the surface that I may
>>>>>>have missed.
>>>>>>
>>>>>>Anyway, I'm probably missing something obvious here.
>>>>>>I intend to run a number of experiments this evening,
>>>>>>but I was curious if anyone else has given this much
>>>>>>thought.
>>>>>>
>>>>>>regards,
>>>>>>--tom
>>>
>>>
>>>Do you have a link to the code (in C) to calculate the hamming distance?
>>>
>>>I searched a bit, but found nothing.
>>>
>>>(I used the mersene twister to calcuate 32 bit random numbers that I concatonate
>>>to 64 bit.  It would be nice to know how 'good' they are.).
>>>
>>>Frank
>>
>>If you're interested I'd be happy to send you the program I use
>>to calculate my random numbers.  It uses the Mersenne Twister or
>>another algorithm I ripped out of Booth's __Inner Loops__ book
>>that seemed to produce decent random numbers (the default is the
>>MT PRNG).
>>
>>Getting the hamming distance in pseudo-code for N random numbers:
>>
>>1. Generate a random number (index=M)
>>
>>2. Compare it to the 0 ... M-1 valid random numbers alread saved
>>
>>   if (popcnt64(new_rand64 ^ array[0..M-1]) >= MIN_HAM) then OK
>>
>>3. If valid, save it into slot M
>>   If not valid (hamming distance is too small) goto 1
>>
>>4. Repeat until you have N random numbers
>>
>>This was off-the-cuff, so I may have left something out but it
>>should give you the basic idea.
>>
>>Of course, this will run forever if you pick a Hamming distance
>>that is too large, so beware!
>>
>>regards,
>>--tom
>
>
>Thanks.  I will give your pseudo code above a try, which unless I misunderstand
>counts the number of different bits - irrespective of their position.
>
>Frank

That's right.  Another way to think of Hamming distance is, how many
bits would you need to flip to turn one number into another.

--tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.