Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash Collision

Author: Sune Fischer

Date: 10:55:12 12/08/02

Go up one level in this thread


On December 08, 2002 at 12:48:54, Uri Blass wrote:

>On December 08, 2002 at 12:23:59, Sune Fischer wrote:
>
>>On December 08, 2002 at 11:17:37, Uri Blass wrote:
>>
>>>On December 08, 2002 at 10:47:20, Sune Fischer wrote:
>>>
>>>>On December 08, 2002 at 10:26:40, Uri Blass wrote:
>>>>>The point is that when you are going to have better hardware you may have more
>>>>>collisions because of bigger hash tables and faster hardware.
>>>>
>>>>A larger table makes it closer to 64 bit, ie if you have 2^20 entries you have a
>>>>32+20=52 bits, double that and you get 53 bits etc. so larger tables in this
>>>>case generates larger and safer keys.
>>>
>>>How do you define your hash table?
>>>I use the following struct
>>>
>>>typedef struct tagHASHE
>>>{
>>>	__int64 key;
>>>	unsigned int depth:7;
>>>	unsigned int flags:2;
>>>	signed int value:16;
>>>	move best;
>>>} HASHE;
>>
>>I thought a bitfield had to be of the same type?
>>What is sizeof(tagHASHE)?
>
>sizeof(HASHE)=16
>
>sizeof(move)=4
>sizeof(__int64)=8
>the other fields have size 1,1,2
>
>I see that I have 7 wasted bits in the hash tables and I can store more
>information with the same memory by replacing depth:7 flag:2 by
>depth:8 flag:8
>
>I may consider to store the number of legal moves there or store partial
>extensions.
>>
>>>It means that I have always 64 bits for the key.
>>>Do you use a key that is dependent on the number of entries in the hash table
>>>and in that case how do you define it?
>>
>>I store the lower 32 bits and use the upper 32 bits for indexing, and I do it
>>manually without bitfielding.
>>One of the main reasons I did the change was that I had 6 bytes wasted because
>>using a 8 byte variable caused the compiler to insist on some allignement that
>>made me waste 6 bytes. Changing all variables to no more than 4 bytes was
>>smarter, then I only wasted 2 bytes.
>>
>>>I guess that when 2^k is the number of entries in the hash tables
>>>I need to have something like
>>>__int64 key:32+k;
>>
>>No, it should be
>>__int64 key:64-k;
>>
>>because you have already used the k bytes for indexing, so no reason to store
>>those bytes. If the keys don't match here the positions will be going into
>>different elements anyway.
>
>It seems to me that in that case you may have problems with big hash tables.

First understand that the length of my key varies with the size of my table.
In this case I think the collision rate remains constant, I effectively add a
bit every time I double the size of the table and this eliminates or cancel out
the effect of higer collision probability because there are more keys to collide
with.

>suppose for the discussion that you have 2^63 entries in the hash tables.
>
>you are going to have the same key in half of the times.

That is very true but it is also a completely hypothetical case.
You are not going to store 2^63*sizeof(HASHE) bytes anytime soon, and even if
you could you probably would not be able to fill the table up.

But for the sake of argument, you are right of course. The only way to
completely repair the problem of collisions on a full table the same size as the
key, would be to have the key large enough so as to be _unique_ for every
position. I think using 196 bits would be very close to achieving that. So there
is an easy solution if the situation one day calls for it.

>I thought about 32+k to give you always probability of at most 1/2^32 for
>collision when you search a new node.

Yes, it would remain constant if that is your point.
However I would not store 32+k in the bitfield, only 32, the rest is embedded in
the indexing of the table, or if you insist on having 64-bit accuracy you should
store 64-k bits, ie all those bits not used for the indexing.

-S.
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.