Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: How to implement hash tables...

Author: Heiner Marxen
Date: 06:50:48 04/23/00
On April 23, 2000 at 03:16:14, Matt wrote:

>I appreciate the help!  I found your explanation most helpful.  As always
>though, your explanation generates even more questions!

Fine!  :-)
Followup questions are expected, that is ok.

>On April 22, 2000 at 12:11:15, Heiner Marxen wrote:
>
>>On April 22, 2000 at 10:35:23, Matt wrote:
>>
>>>Hi...
>>>
>>>I'm a relatively lousy programmer... I've written a very basic chess engine
>>>(around 1950 on FICS) and I would like to work on adding hash tables to it,
>>>however I haven't found any good references to how this is done...  I know the
>>>crafty source is available, but the crafty source scares me.  What I am looking
>>>for is a very general overview of how hash tables work.
>>
>>Basically, you try to have a cashe for the function "search(board, depth)".
>>When search() is called with the same board and with the same depth again,
>>you want to remeber the result, since you are going to do the exact same
>>computation again.   The remebered result avoids the second computation.
>>[ A closer look shows, that the computation is not exactly the same,
>>  since sometimes path dependant information is used.  For the hash table
>>  this is generally neglected, but it can make problems. ]
>>
>>>As I understand it, it works something like this:
>>>
>>>1.  generate hash key for position
>>>
>>>2.  check to see if that key is already in table.  if it is, return the score
>>>from the hash table (eliminating the entire subtree, as its already been
>>>searched before). if not, continue search as normal.
>>
>>... provided the stored depth is sufficient.
>>
>>3. After having continued normally, and thus generated/computed a result
>>   worth to be found in the hash table... enter it into the hash table.
>>
>>>This is done from within your search function if I understand correctly...
>>
>>Yes.
>>
>>>I understand how to initially create and update the hash key on moves and
>>>takebacks, but...
>>>
>>>When are positions ADDED to the hash table?
>>
>>See point 3 above.
>>
>>>is the depth of the search included with the position when its added?
>>
>>Yes.  The depth of the search is a very important parameter to the search
>>function.  You would not want to retrive a 2-ply search result and use
>>that as result of a 7-ply search.
>>
>>>I assume the hash table isnt cleared after every move, how does that affect the
>>>search depth stored in the hash tables for positions found in previous searches?
>>
>>Not at all.  That position _was_ searched with that depth, period.
>>But you may want to install a generation counter scema, discussed recently
>>in this forum, in order to detect entries which have a large depth, but are
>>from a previous search, and thus can be overwritten by a new entry, even
>>if that has much less depth.
>
>Am I correct in saying that it is not necesssary to have this generation counter
>schema? hash tables will work without this modification?  This is my first
>attempt at doing hash tables, I want to keep it as simple as possible ;)

Yes, you are correct, that is not necessary.  Start without it.

>>>I am told one does not search the hash table, it performs a 'lookup'.  how does
>>>this work?
>>
>>You already have a hash key (your point 1 above).  It should be at least 32
>>bit, better more (most programmers seem to find 32 bit to be not enough).
>>48 or 64 bit will be sufficent.  From this huge value we now want to compute
>>a "hash index", i.e. which of the N entries in the hash table to use for
>>this hash key.  Generally you can do  index = key % N.  When N is a power
>>of two this is a simple masking operation.
>
>I was planning on using a 64 bit hash key for now....  I was going to use
>position, en passant, and castling permissions to generate the key, and leave
>out the 50 move and 3 move rep info for now.  This should be enough to make them
>functional, right?

AFAIK, yes, that looks perfect for me.

>I think this next part is the part I am getting hung up on.  Please let me know
>if this is incorrect.
>
>if we are searcing for this position in the hash table, we do
>key_of_current_position%N and find where the position WOULD be stored if it is
>there.

Exactly.

>the % function will return a number 0 to N-1, which represents the various
>'slots' for a hash table entry.
>
>Then we compare the hash key in the table with the one for the current position.

Yes.  There is a special case: the hash table entry may be empty.
That can be coded/detected in many different ways, but you should not try
to use an empty slot.  If an empty slot codes a search depth = 0, you
get it for free (provided the depths you try to get/store are always > 0).

>if the keys are the same, it is almost certainly the same position. We check the
>depth of the info in the hash table, and if it is >= to the current search
>depth, return the score from the hash table.

Yes, nearly exactly.  There are usually three different kinds of values, one
of which can reside in the hash table: a value is either a lower bound, an
upper bound or an exact value (those are the rarest).  Even when the depth
is sufficient, you may or may not be able to use a bound.

>if the keys are different, then there is a key for another position residing in
>the same spot as the key for this position would be.  So even though theres a
>stored position here, it is a different position. we continue with a regular
>search then overwrite the info in the hash table with the info for the position
>we just searched.

Yes.  You got the basic picture right.

>That brings up another question... If we find the wrong position in our hash
>table, do we ALWAYS overwrite it?

That is up to you.  This part is called "replacement strategy".  The most
simple replacement strategy is "always replace".  A usual refinement is,
to compare the depth of both entries, and let the larger depth win (with the
idea that that is the more valuable entry).  That has its own dangers:
the table may fill up with medium depth entries, such that small depth values
are not stored any more.

To help that, you can have two parallel tables, which you probe both to find
an entry, and if not found and computed by search you store the new value
in one table unconditionally, and in the other depending on depth.
(I think that is the crafty strategy).
Another counter measure is to introduce some sort of randomness.

>Also, is there any case in which the hash table SHOULD be cleared?

Well, it should be cleared, when the contents are suddenly wrong, since
the context changed significantly.  If your program would allow to switch
to "loosing chess" rules in the middle of a game, at that point the hash
table starts to contain lies.

It _may_ be a good idea to clear the hash table, when we expect the contents
to be useless for the further operation of the program.  Then the contained
entrys appear to be valuable (some have large depthes), and the replacement
strategy may favor the old (useless) entrys over new entries.

Under normal circumstances and with "always replace" I cannot see any reason
to explicitly clear the table.

>And finally, when somebody mentions a hash table collision, are they referring
>to a case where two positions have the same hash key?  Also, if thats the case,
>how does one detect this?

Yes, that is a hash table collision.  Detected: the selected entry does not
contain the current position, but is also not free.  That is the case, when
the replacement strategy is applied, as discussed above.  "Collision" is
the technical term for this situation.

>Ok I lied, one more question.  Do we store these hash entries regardless of
>search depth? ie do we store them at depth 0?

No, not completely regardless of context.  There is a tradeoff.  Probing the
table and storing results for future hits is not for free, there is an
overhead.  If the straightforward computation would be less expensive than
probing and storing the result (i.e. driving the cache machinery), then
there would be no point in doing so.

Also, you want the valuable entries to survive in the table until they are
reused, i.e. you do not want to sweep the hash table with millions of
cheap entries.  But that is just my opinion.  I recall someone told us here
that he caches even the positions that come up in the q-search.

It will depend on your program.  Try to estimate the tradeoff, decide for
some first strategy, and then compare against variations thereof.
There is no single correct way to do this.

>>>Any help would be most appreciated.  please use small words, this is all new to
>>>me :)
>>
>>Hopefully my words were small, and my sentences clear.  Otherwise ask again,
>>or read the many other explanations to appear here.
>>
>>Of course, my explanations above describe a very simple scema.  Especially
>>the strategy for searching and replacing slots in the table can be refined.
>>But the most simple scema is not so bad.
>
>I'm looking for the most simple right now.  Once i get a handle on that, I can
>start breaking it with "improvements" ;)

That is a good way to do it.

>>Heiner
>>
>>>Matt
>
>Thanks again for all the help

You are welcome!

Heiner
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.