Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hashtables: is larger always better?

Author: Robert Hyatt

Date: 07:24:17 09/25/01

Go up one level in this thread


On September 24, 2001 at 18:06:30, Sune Fischer wrote:

>On September 24, 2001 at 16:24:01, Robert Hyatt wrote:
>
>>Who cares about the number of hash entries?  We aren't scanning the entire table to find a false match.  We index to one position only.  Suppose the table was 2^64 in size so you could store the entire set of possible hash signatures.
>>Does that increase the chances of a collision?
>
>Yes it does - as Uri has also pointed out.
>
>See, if you make a hash the size of 2^64 and fill it (!)
>then you have no free keys left!
>The next position key you generate will match one
>of those in the table with 100% certainty!
>
>This is the most extreme case, in fact you will never be able
>to update your hash with new positions because you are fooled
>to believe you have calculated them all. You will have a very
>high collision rate in this case.

I disagree there.  "high" is relative.  There will be 2^96 different positions
that each map to the same hash table entry.  But the search space will not
include _all_ of those unless we somehow find a way to search to the end of
the game.  It is possible that our search space will never include 2^64 total
positions.  And if the tree is bushy (as it is at present) rather than being
narrow and deep, this further reduces the chances of a collision using the
Zobrist approach we now use.

So far, the sum-total of all memory manufactured has not reached 2^64.  It won't
for a long time.  Much less putting it on one machine to hold 2^64 hash entries.
Even searching 4 B nodes per second, still will require 4B seconds to fill such
a table.  We are many years away from 4B speeds.  The shallower the search, the
lower the probability of a collision.  IE for a 1 ply search, _no_ collisions
will be possible.  Ditto for 2 plies.  It takes a lot of depth to perturb the
hash signature, then un-perturb it back to its original state to get a
collision.  A _lot_ of depth...

I believe Burton Wendroff published some analysis on this in an older JICCA
issue.  I believe it suggested that full-width 20 ply searches are perfectly
safe and free of collisions with properly chosen Zobrist random numbers.

>
>If you only have a small hash of 2^10 entries, then you "only"
>have logged 2^10 keys and so there are many free ones left, and
>you will be able to update the keys that don't match.
>
>Look at it this way:
>You have N keys saved in the hash.
>You generate a new key matching a bran new position never seen before.
>What are the chances that this key collides with one in the hash?
>Well there are 2^64 combinations and N "bad" ones you want to avoid:
>chances are N/2^64 of a collision.
>
>
>> Suppose your table has 1
>>entry.  you can _still_ get a collision.
>
>Yes, but odds will only be 1/2^64.
>


It has to be higher.  There is a definite probability, with the way the
table is updated and probed, that I search two out of a set of 2^96
colliding entries before I search anything else.  And I don't believe that
is particularly unlikely once we reach the NPS values needed to make this
discussion meaningful.

For the moment, it is _all_ very moot.  We can't search anywhere near deep
enough to produce a significant number of 64 bit collisions.  As even 4B
nodes is a _huge_ search and is nowhere near filling a 2^64 table.

For 2^64 the number of potential collisions is huge.  For a program that can't
search 20 plies full-width in the middlegame, the number of actual collisions is
near to zero with reasonable random numbers.


>>I think you are trying to consider "time" in the equation here.  The farther
>>apart two entries are stored in time, the less chance they will falsely match
>>since the first has a greater chance of getting overwritten before the second
>>is reached.  The chances of a collision with 64 bits are so remote, I wouldn't
>>give it a moment's thought.  Because even with a _full_ 64 bit address space
>>for the table, the probability of a false match is one out of 2^64.  Very
>>small.  Because for any 64 bit hash signature, there is only _one_ place I
>>will try to find it stored in the table, regardless of how large the table
>>is.  With that small a chance to get a false match with a full address space,
>>smaller tables only reduce the chances further due to overwrites.
>
>Read some of Uri's posts in this thread, he expains it very well.
>
>-S.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.