Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash collisions and a little maths

Author: Vincent Diepeveen

Date: 05:03:23 02/24/99

Go up one level in this thread


On February 24, 1999 at 06:11:22, Rémi Coulom wrote:

>On February 18, 1999 at 21:06:53, Vincent Diepeveen wrote:
>
>[...]
>>
>>Cool that there is someone who can fiddle with math.
>>
>>Can you change all this to a function that
>>gives the estimated number of hash collisions with inputs:
>>
>>  - #bits
>>  - size of hashtable
>>  - loading factor of hashtable
>>
>>Many thanks,
>>Vincent
>>
>[...]
>
>If you assume that hash codes are independant and identically distributed (which
>is wrong, but might be a good approximation for this purpose), and that your
>hash table is always full (which is the case if you do not clear it after each
>move), then the probability to have a collision for a new position (that is not
>already in the table) is equal to pc = (size of hash table) / (number of hash
>codes). You will have to multiply this probability by the number of rehash you
>do if you use one part of the hash code for indexing and this part is not stored
>in the hash entry for comparison.
>
>Under these hypotheses, we have a Poisson process and the average time before
>the first collision will be 1/pc probes. Note that these are probes with
>original positions, and does not count positions that are already in the hash
>table.
>
>For instance, TCB crashed on a hash collision last week-end, as it was using
>2^18 hash entries, each of them containing 32 bits of hash codes, with a rehash
>of 2. I had pc = 2 * 2^18 / 2^(18 + 32) hence 1/pc = 2^31, which makes
>collisions very likely to happen in a long match.

32 bits are just 4 billion possibilities.

>I think that testing the legality of the hash move is a good way to reduce pc
>significantly. I also think that using the Zobrist technique for computing hash
>codes makes collision less likely since hash codes are not independent and
>identically distributed : the closer the positions, the less likely a collision.

Sure, i have tested that and i couldn't find a way to get a collision first
few XORs, however testing more possibilities was too difficult as there
are too many possibilities to test...

>A good choice of hash keys can help also (see my post about BCH codes).

Right see below.

>A good way to measure all of this would be to make experiments with a small
>number of bits of hash code and measure the collision rate with different
>techniques : with or without test on the hash move, with random or BCH hash
>keys, etc. If I find time to do it one day, I will post the results here.

So there is currently no good formula for Zobrist collisions?

>Remi

I'm using Zobrist. 64 bits, and i store around 32+ 12 + 4 = 48 bits,
but i store 44 bits of the most significant digits and of the least
significant digit i store 4, but that doesn't count as this is index too.

So actually i store 44 bits and my 8 probe is reduced by those last 4 bits,
and with 2^20 size of hashtable i still have 64 bits.

So in short i can assume i can store 64 bits, and i use Zobrist and
reasonably well random numbers (generated by mathematica and a little
modified randomly by hand by me, as the PCs random numbers are not
random i found out some years ago; i got every few hundred thousands
nodes a collision then).

Very important is still to check what side has the move, although i
hash that too. Also i hash en passant and castling. Lost an auto232
game against genius once because diep didn't hash en passant and wanted
to mate in 3 moves because of that, and therefore allowed genius to
promote. Still have that game somewhere.

I have not seen big problems with this 64 bits zobrist
approach, not even when searching 4 billion nodes after a few days.

I know however that i must have had wrong collisions, probably the fact
that i don't store true values hides collision probs from me, as i
cannot have true values in my program; if a score is true then alfa
gets improved and then score is again <= alfa, so i use only 1 bit
to see whether it's >= beta or <= alfa.

I can store a position in just a few bytes, i'll run one day with a small
hashtable and a collission check. That would be cool... ...but after
the world champ i fear.










This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.