Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: hash collisions

Author: Vincent Diepeveen
Date: 10:36:30 01/31/01
On January 31, 2001 at 10:11:23, martin fierz wrote:

>On January 31, 2001 at 09:39:10, Vincent Diepeveen wrote:
>
>>On January 31, 2001 at 08:13:52, martin fierz wrote:
>>
>>>hi,
>>>
>>>i recently corrected some code in my connect 4 program and now it is able to
>>>solve connect 4 in less than a day on a fast PC with a large hashtable (of
>>>course, connect 4 has been solved long ago). i tried again with a smaller
>>>hashtable and
>>>got some strange results, for very long searches (billions of nodes) i don't get
>>>the same value for
>>>the root position. for not-so-deep searches i get the same values for both
>>>versions. i am wondering, [if this is not just a bug :-)] could this be some
>>>hashcollision-problem? can anybody give me a probability for a hash collision
>>>occuring when using B-byte key & lock, with a hashtable with S entries after
>>>searching N nodes? (i am using 4 byte ints for the key and the lock)
>>
>>You're using 32 bits to hash?
>>
>>I think there are like 7x6 rows or so = 2^42 possibilities which you
>>hash in 32 bits.
>>
>>Easier is to store the entire position. This fits easily in 64 bits
>>as with are 7 rows (?) at most 7 open spots are there.
>>
>>you can do next: white = 0, black = 1.
>>open spots also 0.
>>
>>Now also describe how far each row is open. That's 3 bits for 7 rows = 21
>>bits.
>>
>>So in 42 + 21 bits you can store the entire position. That's not hashing
>>but a true hash.
>>
>>For solving a game you definitely need to store the entire position!
>>
>>With just 32 bits you ask for trouble!
>>
>>>also, i think i remember somebody mentioning here that one can choose the random
>>>numbers for the XORs in a clever way making hashcollision probabilities
>>>smaller - can somebody tell me how?
>>
>>You don't need XOR. You need a bitmap:
>>  #define BITBOARD unsigned _int64
>>       in windows (new ansi-C convention also long long)
>>  #define BITBOARD unsigned long long
>>       for gcc compiler
>>
>>by making a few tables you can quickly update the true hash. Should
>>update that incremental of course!
>>
>>Not as fast as a 32 bits XOR, but you'll see a lot of problems in your
>>prog are solved!
>>
>>>cheers
>>>  martin
>
>hi vincent,
>
>thanks for your tip - you are probably right with your suggestion that i should
>store the whole position if i want to solve connect 4. i remember i used to do
>that earlier, even, but switched to a zobrist scheme later because it was
>faster, and then computers were so slow and my program had lots of bugs in it so
>there was no way to solve connect4 at all.

Weird. I solved it some years ago already, i guess you talk about the
80s?

A big speedup is to use more probes for hashing!

>still, as i also have a checkers program and all you guys have chess programs
>which use zobrist hashing, what i'm really interested in is the answer to the
>collision question and not the solution of connect4, which is already known
>since about 1990.

The main answer is to use more significants bits to hash, i
advice 64 for checkers.

In my draughtsprogram, which is a bit more complex as checkers but
basically the same game, i use 64 bits to hash.

>my own reasoning is this: if my hashtable has 2^n entries and i have an m-bit
>lock i have something like a total (n+m)-bit hashkey. if i assume total

This assumption is correct but hashing is not giving a lineair safety
trade off if you reduce the number of bits, this is the big problem
basically.

Reducing the number of bits is deadly. Basically i have done over
the time 2 things wrong in my draughts/chess program:
  - i stored in Napoleon (draughtsprog) 32 bits and
    further used a number of bits to index it
  - instead of using a pregenerated good set of random numbers
    i used the build in random generator

The last is easiest to start with. Most random generators are
based upon the time. Now this means that usually there is a multi
lineair relationship. this is exactly the OPPOSITE of what you
want!

There are many good random generators. Nothing as easy as writing
a random generator within a day for scientists and then only being busy
writing articles about their random generator, so there is plenty
choice out of real good random generators.

Important to know is how they got generated. Something that is
making 'moves' in one way or another is not what you want. Real good
are the rotator generators, the ranrot ones and both C and C++
source is available for them somewhere. Of course don't make the
mistake to combine 64 bits random Zobrist numbers out of 8 bits
random numbers. Get random numbers at once at 64 bits from
the random generator!

My tip is always to use the same random numbers to search with
as this is going to cause determinism in your proggies.

If this easy to solve problem is solved, then let's go to the
second big mistake i made in Napoleon one day.

I stored 32 bits most significant bits and indexed a hashtable
of around 2^20 (12mb for transpositiontable in those days was a lot).

So i thought: "that's 52 bits, not much less as 64 bits and this
game is way simpler as chess".

Ok wrong assumption.

In draughts you can only search fullwidth, all kind of stupid
forward pruning just doesn't work and in openings positions
i already see lines where one side wins all pieces of the
opponent.

So basically at any given stage of the game i can see *any* type
of position. This means that the chance to get a collission is
quite a bit higher as in chess when relatively seen.

So 52 bits wasn't enough, whereas with good random numbers and 64
bits i didn't have any problems.

Note that for those who never made a hashtable before i'll also
quote what one further should store:
  - bound  2  bits (<=alfa bound, >= beta bound, true score)
  - WTM    1  bit  (side to move)
  - best   10 bits (best move)
  - score  16 bits
  - depth  7  bits
  - mask   9  bits (mask to decide whether to overwrite this position)

And if your evaluation is big or you use evaluation to order
moves it's smart to add another 16 bits.

Without this the above is just something like 45 bits.
Assuming a hashentry size of at least 8M entries you can
save somehow easily 23 bits.

However if you do more probes then you should also store the last
few bits as otherwise you lose the last few bits in advance if you
don't store them.

Lucky you can do this without losing any cycle.

So in short i assume 20 bits you can save for hash at.
That leaves only 44 bits to get stored.

Making the grand total 99 bits + a few other things you might
want to store.

So with 16 bytes an entry you can without doubt save everything
very easily.

I remember a few years ago when having a 386 with hardly RAM
(1 megabyte RAM) i stored entries in 10 bytes.

When 486 and pentium arrived i went to 12 bytes an entry.

I see no way to do that nowadays anymore. I put in DIEP things
in 16 bytes with big effort nowadays though i prefer to get
a lot more bits for my overwrite mask as that will work better,
most say you need only 1 or 2 or 3 bits or something for
overwrite mask.

I always wonder how, they only test at a position instead of
at a game?

If you overwrite in such random ways as with 1 bit overwrite masks
i'm sure the chance that you get a collission is a lot bigger
as when using 9 bits.

>randomness in my numbers, then the probability that 2 numbers are the same is
>about 50% as soon as i have 2^[(n+m)/2] random numbers in my table (it's the
>same thing with the class of 20 people where the probability that two have the
>same birthday is about 50%). in my case with a 1'000'000-entry table n=20, m=32,
>then i could have 2^26 random numbers until a collision is likely, thats
>something like 67millions. however, i only have one million entries at a time,
>so the probability for a hash collision should still be smaller than 50% for 67
>million nodes. is this reasoning at least partly correct? i'm sure some of you
>have figured these things out already :-)

I'm bad in calculating chances for chess or draughts, but i'm
pretty sure that you should not calculate the average case but take
into account the worst case and add to that that you might have
bugs in your implementation.

Are you sure you debugged your hashtable well?

How can you be sure it works correctly?

I found once a bug in hashtable 2 years after i had made that code!

Every 6 months or so nowadays i rewrite my hashtable storage and
retrieval code, very important nowadays is to take into account EGTBs :)

But for the chances. Whatever your caclulation, in draughts 52 bits
was not enough for me and 64 bits seems to be ok.

Note that my draughtsprog searches at a searchspeed which is going to
amaze most dudes here, the only penalty i'm suffering from is
hashtable, as for every node you need to check hashtable.

If my RAM gets 2 times faster then my draughtsprog gets nearly 2 times
faster :)





>cheers
>  martin
Re: hash collisions martin fierz 11:50:35 01/31/01
- Re: hash collisions Vincent Diepeveen 06:21:36 02/01/01
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.