Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: hash collisions

Author: martin fierz

Date: 07:11:23 01/31/01

Go up one level in this thread


On January 31, 2001 at 09:39:10, Vincent Diepeveen wrote:

>On January 31, 2001 at 08:13:52, martin fierz wrote:
>
>>hi,
>>
>>i recently corrected some code in my connect 4 program and now it is able to
>>solve connect 4 in less than a day on a fast PC with a large hashtable (of
>>course, connect 4 has been solved long ago). i tried again with a smaller
>>hashtable and
>>got some strange results, for very long searches (billions of nodes) i don't get
>>the same value for
>>the root position. for not-so-deep searches i get the same values for both
>>versions. i am wondering, [if this is not just a bug :-)] could this be some
>>hashcollision-problem? can anybody give me a probability for a hash collision
>>occuring when using B-byte key & lock, with a hashtable with S entries after
>>searching N nodes? (i am using 4 byte ints for the key and the lock)
>
>You're using 32 bits to hash?
>
>I think there are like 7x6 rows or so = 2^42 possibilities which you
>hash in 32 bits.
>
>Easier is to store the entire position. This fits easily in 64 bits
>as with are 7 rows (?) at most 7 open spots are there.
>
>you can do next: white = 0, black = 1.
>open spots also 0.
>
>Now also describe how far each row is open. That's 3 bits for 7 rows = 21
>bits.
>
>So in 42 + 21 bits you can store the entire position. That's not hashing
>but a true hash.
>
>For solving a game you definitely need to store the entire position!
>
>With just 32 bits you ask for trouble!
>
>>also, i think i remember somebody mentioning here that one can choose the random
>>numbers for the XORs in a clever way making hashcollision probabilities
>>smaller - can somebody tell me how?
>
>You don't need XOR. You need a bitmap:
>  #define BITBOARD unsigned _int64
>       in windows (new ansi-C convention also long long)
>  #define BITBOARD unsigned long long
>       for gcc compiler
>
>by making a few tables you can quickly update the true hash. Should
>update that incremental of course!
>
>Not as fast as a 32 bits XOR, but you'll see a lot of problems in your
>prog are solved!
>
>>cheers
>>  martin

hi vincent,

thanks for your tip - you are probably right with your suggestion that i should
store the whole position if i want to solve connect 4. i remember i used to do
that earlier, even, but switched to a zobrist scheme later because it was
faster, and then computers were so slow and my program had lots of bugs in it so
there was no way to solve connect4 at all.
still, as i also have a checkers program and all you guys have chess programs
which use zobrist hashing, what i'm really interested in is the answer to the
collision question and not the solution of connect4, which is already known
since about 1990.

my own reasoning is this: if my hashtable has 2^n entries and i have an m-bit
lock i have something like a total (n+m)-bit hashkey. if i assume total
randomness in my numbers, then the probability that 2 numbers are the same is
about 50% as soon as i have 2^[(n+m)/2] random numbers in my table (it's the
same thing with the class of 20 people where the probability that two have the
same birthday is about 50%). in my case with a 1'000'000-entry table n=20, m=32,
then i could have 2^26 random numbers until a collision is likely, thats
something like 67millions. however, i only have one million entries at a time,
so the probability for a hash collision should still be smaller than 50% for 67
million nodes. is this reasoning at least partly correct? i'm sure some of you
have figured these things out already :-)

cheers
  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.