Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: zobrist hashing

Author: Robert Hyatt
Date: 15:18:54 05/31/99
On May 30, 1999 at 10:07:16, Ricardo Gibert wrote:

>On May 29, 1999 at 23:33:06, Robert Hyatt wrote:
>
>>On May 29, 1999 at 22:29:15, Ricardo Gibert wrote:
>>
>>>Section 4.6 in Algoritms + Data Structures = Programs by Wirth is correct.
>>>>
>>>The 1983 edition of Algoritms by Sedgewick gives the typical incorrect
>>>exposition of the topic, which you seem to echo rather closely.
>>>>
>>>The 1984 Edition of Fundamentals of Computer Algorithms by Horowitz & Sahni I
>>>have is also incorrect.
>>>>
>>>I'll check Knuth when I pass by the bookstore today.  I'm pretty sure he gets it
>>>right.  He is very meticulous & comprehensive of course.  Not very good at
>>>explanations though.  Not a good writer.  He seems to set the standard,
>>>nevertheless.  I'll probably wind up posting a quote from Wirth even though it
>>>looks like it will be overly long. They don't let you use a copier in bookstores
>>>unfortunately.
>>>>
>>>Just think about what you are saying.  "M mod N" would then be hashing a
>>>function according to you.
>>
>>
>>M mod N is most definitely a hashing function.  It is a poor one because only
>>the low order bits play a part in the result.  But hashing it is.  As is the
>>Zobrist approach... as is the approach of summing the characters in a name, and
>>so forth...
>>
>>But as I said, what would be the purpose of mapping an N bit value into an
>>N bit value?  It still gets used as an index into a hashed 'thing'.
>>
>>>  A poor one, because of all the collisions.  Hashing
>>>is a strategy to spread out keys to avoid collisions when you map from M to N.
>>>Mapping function is the term you defined.  All B-trees are mapping functions
>>>too, but are not hash functions as you imply.  They fit your "definition".
>>
>>B trees have nothing to do with hashing.  They don't 'map' anything at all
>>by my definition.  ie the idea of 'hash=F(value)' where value is bigger than
>>the result 'hash'.
>>
>>
>>
>>
>>>>
>>>What happens in computer science, as in many fields, is terminology &
>>>definitions are used "loosely".  It's is very awkward & laborious to be precise
>>>all the time.  Particularly in conversation.  The trouble is the listener/reader
>>>does not know when the writer/speaker is being precise or is using the term
>>>loosely.  So the facts get "hashed" and you have many books treating topics
>>>incorrectly.  I can give many examples in sorting & numerical analysis if you
>>>are interested.
>>
>>
>>I don't see a 'loose definition'.  Unless you map from large -> small value,
>>collisions don't make sense.  If if you get collisions when mapping from X to
>>X' where the lengths are the same, I fail to see the purpose.  In the books
>>I have, typically in data structures, hashing includes a discussion about
>>collisions, chaining, buckets and ways to handle collisions.  Collisions only
>>make sense when mapping large things to a small address space.  I am used to the
>>idea of a mapping function X'=F(X). And for this mapping function to be called
>>a 'hashing function' then X' < X for all X.  usually significantly less than...
>>as in chess board -> 64 bits or 32 bits.
>>
>>>>
>>>By the way, I used to drive my college profs crazy with this type of problem.
>>>So you have my sympathies!
>
>Quoting from Wirth:
>"4.6 Key transformations (Hashing)"
>.
>.   <text skipped here>
>.
>"...Clearly, in a computer store each item is ultimately accessed by specifying
>a storage address 'a'.  Hence, the stated problem is essentially one of finding
>an appropriate mapping H of keys (K) into addresses (A): H: K -> A
>In section 4.5 this mapping was implemented in the form of various list and tree
>search algorithms based on different underlying data organizations.  Here we
>present yet another approach that is basiclly simple and very efficient in many
>cases.  The fact that..."
>This contradicts what you said and agrees with what I said.

No it doesn't.  _IF_ the original 'part/stock' number is usable as an address,
no hashing is needed.  But stock numbers are generally large numbers because
they are used to number _all_ products from _all_ manufacturers.  Now we want
to 'squeeze' that big number down into a small index that can be used to
directly probe our small local database.  That has _always_ been the purpose
of hashing.  If the number is _already_ usable as an index, why in the world
would we want to transform it again?  No reason at all.  Only if the number is
too big... and whenever you transform a big number into a small number, that is
a many-to-one mapping that can have collisions.




>   "The data organization used in this technique is the array structure.  H is
>therefore a mapping transforming keys into array indices, which is the reason
>for the term 'key transformation' that is generally used for this technique."
>Wirth does not use the term "hash" for the 1 1/3 pages until discussing the
>scattering technique which IS hashing.
>.
>.     <text skipped here>
>.
>   "The fundamental difficulty in using key transformation is that the set of
>possible key values is very much larger than the set of available store
>addresses (array indices).  A typical example..."
>.

and that is the issue.  "large address space" --> "small address space"
Which is what _everyone_ calls "hashing".



>.     <text skipped here>
>.
>"The function H is therefore obviously a many-to-one function."
>Wirth still does not call this hashing as you do.  He now discusses collisions &
>how to deal with them, which I now skip.


calling this mapping is perfectly acceptable.  He did call it a many-to-one
mapping, which is the typical definition for 'hashing' as well.



>.
>.
>.
>"4.6.1 Choice of transformation function
>   A prerequisite of a good transformation function is that it distributes the
>keys as evenly as possible over the range of index values.  Apart from
>satisfying this requirement, the distribution is not bound to any pattern, and
>it is actually desirable if it gives the impression that it is entirely at
>random.  This property has given this method the somewhat unscientific name
>'hashing', i.e., 'chopping the argument up' or 'making a mess,' and H is called
>the 'hash function'.  Clearly, it should be easily computable,..."
>Bingo!  Game set and match.  What you call hashing he refers to as a key
>transformation, which may or may not be many-to-one.  If Wirth is to be
>believed, you were wrong on all counts.  By the way, you never did address the
>howler you made about perfect hashing being an oxymoron ;-).

I believe that (a) you are simply reading and trying to find grammatical
phrases to win an argument that is wrong.  As he said "hashing" is yet another
term for mapping or transforming.  And he _clearly_ said many-to-one above where
I pointed it out.  So you get no point there, and a grade of zero.  And no
amount of arguing is going to change what was said above...  (b) "perfect
hashing" was not a "howler".  Because it applies to a 'many-to-one' mapping,
as I said before.  And there is no way to be 'perfect' there.  You can _not_
avoid collisions.  You can not transform a set of keys to a set of indices
such that the indices have no duplicates nor gaps, when the original set of
keys is larger than the final set.

If you want to try to make points by taking strange interpretations of what you
read, to get your professor's goat, feel free.  however, some of us don't buy
such nonsensical arguments.  Nothing you quoted above changes anything I have
said...




>>
>I will let you have the last word on this thread. I'm seriously tired of it.
>My apologies to Wirth for any errors I may have made in transciption.  The 2nd,
>3rd from last single quoted items were double quoted in his text.  The other
>single quoted items were in italics in his text.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.