Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: zobrist hashing

Author: Ricardo Gibert

Date: 07:07:16 05/30/99

Go up one level in this thread


On May 29, 1999 at 23:33:06, Robert Hyatt wrote:

>On May 29, 1999 at 22:29:15, Ricardo Gibert wrote:
>
>>Section 4.6 in Algoritms + Data Structures = Programs by Wirth is correct.
>>>
>>The 1983 edition of Algoritms by Sedgewick gives the typical incorrect
>>exposition of the topic, which you seem to echo rather closely.
>>>
>>The 1984 Edition of Fundamentals of Computer Algorithms by Horowitz & Sahni I
>>have is also incorrect.
>>>
>>I'll check Knuth when I pass by the bookstore today.  I'm pretty sure he gets it
>>right.  He is very meticulous & comprehensive of course.  Not very good at
>>explanations though.  Not a good writer.  He seems to set the standard,
>>nevertheless.  I'll probably wind up posting a quote from Wirth even though it
>>looks like it will be overly long. They don't let you use a copier in bookstores
>>unfortunately.
>>>
>>Just think about what you are saying.  "M mod N" would then be hashing a
>>function according to you.
>
>
>M mod N is most definitely a hashing function.  It is a poor one because only
>the low order bits play a part in the result.  But hashing it is.  As is the
>Zobrist approach... as is the approach of summing the characters in a name, and
>so forth...
>
>But as I said, what would be the purpose of mapping an N bit value into an
>N bit value?  It still gets used as an index into a hashed 'thing'.
>
>>  A poor one, because of all the collisions.  Hashing
>>is a strategy to spread out keys to avoid collisions when you map from M to N.
>>Mapping function is the term you defined.  All B-trees are mapping functions
>>too, but are not hash functions as you imply.  They fit your "definition".
>
>B trees have nothing to do with hashing.  They don't 'map' anything at all
>by my definition.  ie the idea of 'hash=F(value)' where value is bigger than
>the result 'hash'.
>
>
>
>
>>>
>>What happens in computer science, as in many fields, is terminology &
>>definitions are used "loosely".  It's is very awkward & laborious to be precise
>>all the time.  Particularly in conversation.  The trouble is the listener/reader
>>does not know when the writer/speaker is being precise or is using the term
>>loosely.  So the facts get "hashed" and you have many books treating topics
>>incorrectly.  I can give many examples in sorting & numerical analysis if you
>>are interested.
>
>
>I don't see a 'loose definition'.  Unless you map from large -> small value,
>collisions don't make sense.  If if you get collisions when mapping from X to
>X' where the lengths are the same, I fail to see the purpose.  In the books
>I have, typically in data structures, hashing includes a discussion about
>collisions, chaining, buckets and ways to handle collisions.  Collisions only
>make sense when mapping large things to a small address space.  I am used to the
>idea of a mapping function X'=F(X). And for this mapping function to be called
>a 'hashing function' then X' < X for all X.  usually significantly less than...
>as in chess board -> 64 bits or 32 bits.
>
>>>
>>By the way, I used to drive my college profs crazy with this type of problem.
>>So you have my sympathies!

Quoting from Wirth:
"4.6 Key transformations (Hashing)"
.
.   <text skipped here>
.
"...Clearly, in a computer store each item is ultimately accessed by specifying
a storage address 'a'.  Hence, the stated problem is essentially one of finding
an appropriate mapping H of keys (K) into addresses (A): H: K -> A
In section 4.5 this mapping was implemented in the form of various list and tree
search algorithms based on different underlying data organizations.  Here we
present yet another approach that is basiclly simple and very efficient in many
cases.  The fact that..."
This contradicts what you said and agrees with what I said.
   "The data organization used in this technique is the array structure.  H is
therefore a mapping transforming keys into array indices, which is the reason
for the term 'key transformation' that is generally used for this technique."
Wirth does not use the term "hash" for the 1 1/3 pages until discussing the
scattering technique which IS hashing.
.
.     <text skipped here>
.
   "The fundamental difficulty in using key transformation is that the set of
possible key values is very much larger than the set of available store
addresses (array indices).  A typical example..."
.
.     <text skipped here>
.
"The function H is therefore obviously a many-to-one function."
Wirth still does not call this hashing as you do.  He now discusses collisions &
how to deal with them, which I now skip.
.
.
.
"4.6.1 Choice of transformation function
   A prerequisite of a good transformation function is that it distributes the
keys as evenly as possible over the range of index values.  Apart from
satisfying this requirement, the distribution is not bound to any pattern, and
it is actually desirable if it gives the impression that it is entirely at
random.  This property has given this method the somewhat unscientific name
'hashing', i.e., 'chopping the argument up' or 'making a mess,' and H is called
the 'hash function'.  Clearly, it should be easily computable,..."
Bingo!  Game set and match.  What you call hashing he refers to as a key
transformation, which may or may not be many-to-one.  If Wirth is to be
believed, you were wrong on all counts.  By the way, you never did address the
howler you made about perfect hashing being an oxymoron ;-).
>
I will let you have the last word on this thread. I'm seriously tired of it.
My apologies to Wirth for any errors I may have made in transciption.  The 2nd,
3rd from last single quoted items were double quoted in his text.  The other
single quoted items were in italics in his text.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.