Author: Robert Hyatt
Date: 19:35:49 09/24/01
Go up one level in this thread
On September 24, 2001 at 18:44:04, Uri Blass wrote: >If the probability is really 1/2^64 for every new node then you are not getting >one hash collison for every few billion nodes. There is no way to produce perfectly random hash distributions. Let's do this in a mathematically sound form. First, I am going to assume that a real chess position requires 160 bits to represent it. This is pretty close to a real number, so let's use it as a fact. Second, we are hashing 160 bits of information into 64 bits. That suggests a _huge_ number of collisions. In fact, for each entry in the table, there will be 2^96 _different_ chess positions that map to the same hash signature. So there is no reasonable math that is going to suggest any sort of collision rate here. Now, on to the experiment several of us did a few years ago. I took a bunch of chess positions, and I did both hashing and _real_ verification. Each hash entry was encoded as 32 bytes (4 bits per square) so that the true board representation was matched when the 64 bit signatures matched. I ran this on a _monster_ machine as well as on my office machine. Searching about 1M nodes per second produced less than one real collision per hour. Here a collision is the _real_ thing... two positions where the 64 bit signatures match exactly, but the 32 byte encodings were different. Varying the hash table size didn't have much effect. We concluded that such a collision rate was acceptable. Stanback got similar results. We then did the same test using 32 bit hash signatures and found that was totally impossible. With multiple collisions per second (for me at 1M nodes per second), per minute for him (at a significantly slower search speed). Our conclusion: 64 bits is trustworthy. 32 is not. For the record, older programs like chess 4.x and Duchess did not store just a hash signature. They stored a 32 byte encoding of the board, using the Zobrist signature for the table address only. They were afraid of the signature-only approach. I did a bunch of testing in 1976 to convince myself that storing the 64 bit signature was "good enough". > >In this case you get one hash collision for every 2^64 nodes that is clearly >more than few billions and simple math shows that the total number of nodes that >deeper blue can search in 100 years is smaller than 2^64(I assume that Deeper >blue can search 2^30 nodes per second for this discusion). Doesn't matter. DB doesn't hash the last 5-7 plies + q-search. So they won't be stressing the 64 bit hash signature that badly... > >In 100 years you have 3600*24*365*100<2^32 seconds. > >The fact that you get practically hash collisions in rare cases prove that the >probability is bigger than 1/2^64 > >I do not say that the probabilities are not going to be enough for a long time >and I tend to believe that there is not going to be a problem in the next 10 >years but I need more data to be sure about it and it is the probability to play >a different move because of one hash collision at a random node. Easy to test. Run a test position for several minutes to get the right answer. Then re-run it but where you do the hash probe, make a percentage of the probes behave in a random fashion. IE the signature doesn't match, but say it does and return EXACT. Find out what the percentage has to be for the fake entries to influence the root score consistently. Then you will know how many false matches it will require to start to actually influence the score at the root, and the game. I did a small test like this a long while back. I only tested a 1 in one hundred million probe false match and found no effect I could measure at all. So the threshold is way below one collision every one hundred million nodes to screw things up... > >Maybe somebody can try to simulate hash collision by changing a score of only >one random node in the tree and giving information for the probability to get a >different move as a function of the size of the tree. > >I believe that you are going to find that a single hash collision is less >dangerous when the size of the tree becomes bigger. Isn't that logical? The smaller the percentage of false matches, the less the effect? > >Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.