Author: Tom Likens
Date: 11:33:18 09/03/03
Go up one level in this thread
On September 03, 2003 at 14:09:18, Frank Phillips wrote: >On September 03, 2003 at 13:43:13, Tom Likens wrote: > >>On September 03, 2003 at 13:28:20, Frank Phillips wrote: >> >>>On September 03, 2003 at 10:52:35, Tom Likens wrote: >>> >>>>On September 03, 2003 at 02:36:33, Tony Werten wrote: >>>> >>>>>On September 02, 2003 at 13:00:05, Tom Likens wrote: >>>>> >>>>>> >>>>>>This is a general query about an issue I've run into and >>>>>>I'm wondering if anyone else has dealt with it or if I'm >>>>>>just off base. Essentially, the issue is this- recently >>>>>>I started playing around with my hash table's random >>>>>>numbers to see if I could improve them. Currently, for >>>>>>the main table I have a Hamming distance of 24 for >>>>>>roughly 800 random 64-bit values. >>>>>> >>>>>>This is where my inquiry comes in, I use the lower N >>>>>>bits of my hash key as an index into the table. I'm >>>>>>wondering, even though the overall Hamming distance is >>>>>>24 shouldn't I be concerned about the Hamming distance >>>>>>of the lower N bits? >>>>> >>>>>It could be, but it's unlikely it would give problems. >>>>> >>>>>To make your keys better, you should not strive for the biggest average hamming >>>>>distance, but for the biggest minimum hamming distance. >>>>> >>>>>22 is doable, wich should give you a collision every never. >>>> >>>>Tony, >>>> >>>>I was a little imprecise, my *minimum* Hamming distance is 24, >>>>the average is closer to 32. I don't generate my random numbers >>>>on the fly, but instead have a separate program that creates >>>>the numbers and saves them to a static array that becomes part of >>>>the program proper. >>>> >>>>Last night I changed this program slightly to give me a minimum >>>>Hamming distance of 10 on the lower 32-bits (I tried 12 initially >>>>but killed it after four hours of run time, without any results). >>>>It also verified that the overall minimum distance for the 64-bit >>>>values was still 24. >>>> >>>>Anyway, long story short, my collision rate in the repetition >>>>hash table went down significantly. I intend to run a final >>>>experiment tonight to actually measure the collisions for >>>>different 32-bit distances (there has to be a graph in here >>>>somewhere ;) >>>> >>>>regards, >>>>--tom >>>> >>>>> >>>>>Of coarse you still have the risk of the lower part being worse than the upper >>>>>part, but you can just let your computer search a bit longer for a minimum >>>>>hamming distance of 11 in the lower part. >>>>> >>>>>Tony >>>>> >>>>>>If these bits are alike, even >>>>>>though the overal value is reasonable don't it increase >>>>>>the probablity of hash collisions considerably? >>>>>>Of course, I won't get a false match since I still use >>>>>>all 64-bits of the key to indicate if the hash entry >>>>>>is valid, but it's time wasted performing multiple >>>>>>probes into the table. >>>>>> >>>>>>I'm also guessing that this could be more of an issue >>>>>>for the repetition hash table, since it is quite a bit >>>>>>smaller than the main table. Currently, I don't do >>>>>>multiple probes into this table and I've never seen >>>>>>an issue. Still, I'm starting to wonder if there is >>>>>>a problem lurking below the surface that I may >>>>>>have missed. >>>>>> >>>>>>Anyway, I'm probably missing something obvious here. >>>>>>I intend to run a number of experiments this evening, >>>>>>but I was curious if anyone else has given this much >>>>>>thought. >>>>>> >>>>>>regards, >>>>>>--tom >>> >>> >>>Do you have a link to the code (in C) to calculate the hamming distance? >>> >>>I searched a bit, but found nothing. >>> >>>(I used the mersene twister to calcuate 32 bit random numbers that I concatonate >>>to 64 bit. It would be nice to know how 'good' they are.). >>> >>>Frank >> >>If you're interested I'd be happy to send you the program I use >>to calculate my random numbers. It uses the Mersenne Twister or >>another algorithm I ripped out of Booth's __Inner Loops__ book >>that seemed to produce decent random numbers (the default is the >>MT PRNG). >> >>Getting the hamming distance in pseudo-code for N random numbers: >> >>1. Generate a random number (index=M) >> >>2. Compare it to the 0 ... M-1 valid random numbers alread saved >> >> if (popcnt64(new_rand64 ^ array[0..M-1]) >= MIN_HAM) then OK >> >>3. If valid, save it into slot M >> If not valid (hamming distance is too small) goto 1 >> >>4. Repeat until you have N random numbers >> >>This was off-the-cuff, so I may have left something out but it >>should give you the basic idea. >> >>Of course, this will run forever if you pick a Hamming distance >>that is too large, so beware! >> >>regards, >>--tom > > >Thanks. I will give your pseudo code above a try, which unless I misunderstand >counts the number of different bits - irrespective of their position. > >Frank That's right. Another way to think of Hamming distance is, how many bits would you need to flip to turn one number into another. --tom
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.