Author: Dan Newman
Date: 21:46:18 10/10/97
Hi all, I've read with great interest various speculations about (and calculations of) hashing error rates. Ed Schröder mentioned that he'd done some tests which involved storing the whole position, but that there was a doubling of hash table size and a huge 20-30x performance hit--which I suppose is large enough to prevent accurate measurement in a reasonable amount of time. What follows is a technique that should accurately measure the rate with a nearly insignificant hit on performance. (I hope this isn't well known--I haven't seen it mentioned anywhere though). 1) Let's assume the position hash code is produced as two 32-bit values using the usual Zobrist scheme (and that all the bits in these words are "good"). One of these values is stored in the hash table entry as a verification key, the other is trimmed down (masked) to say 20 bits for indexing the hash table (1 M entries). 2) Room is made in the hash table entry (say 8-bits--it could easily be smaller) to store a secondary "error detection" key. Eight bits of the unused portion of the hash code are stored here (I've always just thrown the remaining portion out--giving me a 52-bit signature for a 1 M entry hash table). 3) Now, whenever you get a successful hash table hit (i.e., the current position's key matches the hash table entry's key and all other verification that you normally do succeeds) you also compare the error detection keys. If these keys don't match, you record a hashing error. If they do match, you don't know whether there's an error or not. If you use an 8-bit error detection key, there should only be approximately one in 256 hashing errors that go undetected by this scheme. (If the error rate is large enough you can even correct it by adding on another 1/256th. :)) There are lots of variations to this scheme: you can test 64-bit hash codes by generating yet another (independent) 8-bit (or whatever) hash code and using it for the error detection key. For testing smaller hash codes one can simply mask off a few bits from the key when making the usual verification check and use those few bits as the error detection key: instead of if( current_key == hashtable_key ) {...} do something like this: unsigned long test = current_key ^ hashtable_key; if( (test & 0x00ffffff) == 0 ) { if( test != 0 ) hash_error_count++; [...] } where the ellipsis is for whatever you do when you get a hit. Now, the question is, how should the error rate be expressed, as a percent of hash table accesses, percent of hits, or errors per second, or what?--I lean towards the errors/second measure since it gives you an easy way to calculate the number of games that are potentially affected. -Dan. PS, I just tried this scheme in a old chess engine that I wrote (the current one doesn't have the transposition table stuff in it yet)--the error rate was unmeasurable with just a few minutes of search time (45 bits of hash code, 8-bit error key). I lowered the number of bits in the hash code (index+key) to 29 and got an error rate of about 600-900 errors/second while playing a few moves starting in a couple of different positions. I upped the bits to 33 and got between zero and 8 errors/s. 37 bits gives between zero and one errors/s. With 41 bits the error rate was too low for me to get a good measurement (only 3 errors over a few minutes). The above was all at about 70-120 knps with an always replace replacement scheme. If the error rate for 29 bits is about 750/s say, and every bit we add to the hash code lowers this by a factor of two (this seems correct to me) then for 53 bits (my usual) I should get 4.47e-5 errors/s or .16 errors/hour--ugh, maybe I should put those extra bits into the hash table entry, bump the hash code up to 64 bits (~.7 errors/year). -Dan.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.