Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hashtables: is larger always better?

Author: Robert Hyatt

Date: 11:21:32 09/25/01

Go up one level in this thread


On September 25, 2001 at 11:20:47, Sune Fischer wrote:

>On September 25, 2001 at 10:12:31, Robert Hyatt wrote:
>
>>>I understood the question as: "is larger _always_ better".
>>>The answer would be no.
>>>Because a hash table of size 2^64 and a computer that could do 2^64 nodes per
>>>move would fill this in no time and the rest of the game after ply ~30 would be
>>>_all_ collisions!
>>
>>
>>No it wouldn't.  Remember that there are roughly 2^160 different unique
>>chess positions.  We can store up to 2^64 of those.  Leaving 2^96 possible
>>collisions for every possible entry in the 2^64 table.  It doesn't matter
>>whether you store 2^32 entries or 2^64 entries.  The potential collisions
>>are both so large it doesn't matter.
>
>Using my example with 2^64 unique nodes calculated per move and a 2^64 size
>table, you will fill this table on the first move (assuming no collisions). If
>you have an average branch factor of 6, then this is less than 30 plies deep. So
>when you get to positions deeper than 30 plies, you will be seeing only new
>positions, and since your table is full with outdated nodes, they will all
>collide in the hash.

no... that is simply an incorrect assumption.  You are assuming that we search
until we fill 2^64 entries, which is going to take way beyond 2^64 nodes in
the search due to transpositions.  Then we somehow jump over to another part
of the search that _only_ produces positions that match in 64 bits, but not
in the full 160.  That is very unlikely to happen.  The game progresses
naturally forward and _most_ probes into a space containing a full 2^64
nodes are going to be _real_ matches, not false ones.  It is more likely that
as you search, old things _will_ be overwritten by new things, due to the way
I "age" the hash entries.  When the table is full, _every_ probe is not going
to be a false match.  Almost 100% of them will be true matches with a few false
matches thrown in as new positions are searched which match in 64 but not in
160.

This logic that says a larger hash table is worse is simply defective.  There is
a distinct gain as the hash table is made larger.  I simply claim that this gain
_always_ overrules any possible small increase in collisions.  _always_.

Which means, simply stated, "bigger is _always_ better, or at least it is
_never_ any worse than a smaller table.  Not always better because once the
table is big enough to hold everything you can search, more won't help."



>
>If you "only" use a 2^32 size table, you will have a 50% chance for each of the
>2^64 nodes to collide, so that's 2^32 collisions per move.
>This continues all the way down to 1, where there will be only 1 collision on
>average per move.

Again assuming that the search is now only searching _new_ positions.  This is
not a good assumption.  By the time the table is that full, most probes will
be hits or misses, rather than simply collisions.



>
>
>>And the only undetected collisions are
>>going to be the cases where you search two of those 2^96 different positions
>>that map to the same table entry.  And it won't matter whether you have just
>>one table entry or 2^64.
>
>All depends on the amount of nodes searched, if you only search around 2^22
>nodes per move, then you would never fill the huge table anyway.
>And when computers someday actually become that fast, then we'll just use larger
>keys.
>
>>going from 2^20 entries to 2^21 entries does not change this enough to even
>>think about.
>
>It changes by a factor of 2, I'm not saying that it is a problem,
>I'm merely saying that it is a mathematical fact.
>(I have made the assumtion that you fill the table quickly, because these
>probabilities are incorrect while there are still avalible space in the hash.)
>


if the table could be that big, then for each iteration, exactly 1/2 of
the total nodes searched were the same nodes that were searched the last
iteration, with a new "layer" of endpoints added on.  But then, these "new"
positions are very likely not new at all, but simply transpositions to positions
already searched.

You are making an assumption that isn't exactly correct.  That as we search,
we are only traversing _new_ positions that have not been seen before.  That is
generally wrong.  It would not be hard to compute this statistic for reasonable
sized hash tables using a real engine and real search.





>>We would just be depending on a bit of luck to cause the first
>>of the two positions to be overwritten before we probe and false-match on the
>>second.  That is yet another degree of freedom.  And it gets dwarfed by that
>>huge collision predictor (2^96 positions for every single table entry)
>>anyway...
>
>I'm not disagreeing with that at all.
>
>-S.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.