Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hash collisions with deep searches

Author: Robert Hyatt

Date: 21:52:51 12/23/02

Go up one level in this thread


On December 23, 2002 at 13:56:17, Vincent Diepeveen wrote:

>On December 22, 2002 at 06:44:04, Frank Phillips wrote:
>
>>On December 21, 2002 at 13:57:09, Vincent Diepeveen wrote:
>>
>>>On December 20, 2002 at 22:05:14, Walter Faxon wrote:
>>>
>>>>On December 20, 2002 at 05:40:35, Gerd Isenberg wrote:
>>>>
>>>>>On December 19, 2002 at 16:50:32, Dann Corbit wrote:
>>>>>
>>>>>>Suppose that we are searching at long time control, and the hash table rapidly
>>>>>>fills up.  This can throw move ordering in the toilet, if we can't find
>>>>>>transpositions any more.
>>>>>>
>>>>>>What sort of replacement scheme is best for very long searches (several
>>>>>>minutes)?
>>>>>
>>>>>Hi Dann,
>>>>>
>>>>>I use a two table approach.
>>>>>
>>>>>A primary big one, where i replace one of eight with the lowest
>>>>>timestamp/draft/someFlags priority. Exact entries became a bit more resistance
>>>>>against overwriting, the same for most left succesors of every root move (pv and
>>>>>refutations lines).
>>>>>
>>>>>A smaller secondary one, with an always replacement scheme.
>>>>>
>>>>>In general i don't store or probe in qsearch. I use IID in __every__ interior
>>>>>node without a move hint from hash.
>>>>>
>>>>>Regards,
>>>>>Gerd
>>>>
>>>>
>>>>Hi, Gerd.
>>>>
>>>>Good idea about IID!
>>>>
>>>>I have no chess program but it seems to me that one uses hashing for two
>>>>reasons:  (1) to save information that was expensive to compute but may not be
>>>>immediately useful (closer to the root), and (2) to save information that wasn't
>>>>very expensive to compute but may be useful very soon (close to the current
>>>>node).  That's why a two-table approach (or a one-table, two types of entry
>>>>approach) is so often preferred.
>>>
>>>Not really.
>>>
>>>first there was a concept which most used with a single hash transposition.
>>>then the 2 table concept was invented, but long before that already 8
>>>probe concepts were there.
>>>
>>>Yet recently cache lines (starting with DDR ram and RDRAM) have increased
>>>that much in length, that you for free can implement without risk a
>>>concept of 4 probes in a row.
>>>
>>>That's obviously working a lot better than 2 slow accesses to ram at
>>>random positions.
>>>
>>
>>I am confused.  Memory access for hash tables should be random - yes?
>>Nevertheless, will not the following be together in memory.  There could be more
>>elements within the structure of course.
>>
>>
>>typedef struct trans_record {
>>	TransRecordT top;
>>#if defined TWO_LEVEL_HASH_TABLE
>>	TransRecordT bottom;
>>#endif
>>} HashRecordT;
>>
>
>the first reference to hashtable most likely will be random. it's sad
>but that's probably the truth. the other probes should be sequential.
>
>Here is what crafty 18.15 (from around june-august 2002,
>correct me if i'm wrong there) is doing:
>
>int HashProbe(.. ) {
>  HASH_ENTRY *htable;
>  ..
>  htable=trans_ref_a+((int) temp_hashkey&hash_maska);
>  .. (probe 1)
>  ..
>  htable=trans_ref_b+((int)temp_hashkey&hash_maskb);
>  .. (probe 2)
>
>Both trans_ref_a and b were a special malloc to the
>memory. That means that both the reference to trans_ref_a
>is eating a random cache line from memory. That is
>very expensive. I never managed to measure it very well
>but i gamble it is around 400 clocks or something to
>get that cache line.
>
>Then another *random* cache line costs another 400 clockticks.
>
>Faster is reading sequential memory. It's good to know that
>the latest crafty is doing that too from Bob's posting here.
>
>Good to know he corrected that.
>

It was a 2% improvement.  Not very significant as I had told you _many_
times...




>A single cache line gets 64 (DDR ram) or 128 (RDRAM) bytes
>in a single random read.

That is wrong.  DDR vs RDRAM has nothing to do with a cache line.  Cache
is the same regardless of which memory type is used.  On intel, L1 is 64
bytes per line, L2 is 128 bytes, if we are talking about PIV.




> Now if you do not put your hashtable
>at the start of a cache line (which is of course hard to do
>in case of using a number of bytes which can't divide 64),
>then when trying 4 sequential probes of each hashentry being
>16 bytes means you need 64 bytes.
>
>That can be 1 cache line, or in bad luck 2 cache lines. But they
>are sequential. Memory is far faster reading sequential memory than
>it can read random cache lines.

It isn't "far faster".  It is "a bit faster".  And it isn't "bad luck"
either.  The probability can be simply calculated.  For Crafty, with 16
byte entries, looking at L2 which is the only cache that matters since it
is the cache that reads from memory, the probability of a single line read
is easy to calculate.


>
>In fact it is *very fast* reading sequential cache lines.

Depends on the definition of "very fast".


>
>The 2 table concept (with sequential implementation)
>is far superior to a single probe.
>
>However it means you divide your hashtable by 2 in fact in total
>size.
>
>There is a far superb way to do it without losing the entire size
>of the hashtable.
>
>That's using a single table where you get to an entry number like you
>do now and then probe all the 4 sequential entries.
>
>Also it is less code than the 2 probes are in crafty currently.
>
>Bob 'unlooped it' in 18.15 but it is easy to loop:

I "unrolled" it for a reason.  It was trivial to roll it back up, but
the unrolled version is faster.

>
>htable = ..;
>for( i = 0 ; i < N_HASHPROBES ; i++ ) {
>  ..
>  htable++;
>}
>
>
>I use the number 8 for N_HASHPROBES in DIEP. I'm not claiming
>that works best for you either. If you have sdram i would limit it to 2
>if you have ddr ram i would take 4 and if you have rdram i would go for 8.
>
>of course assuming 16 bytes an entry. In diep i regrettably didn't manage
>to get my entry size smaller than 24 bytes because i do not lock the
>hashtable (12 bytes are hashkey signature; so i easily can make that
>20 bytes when i would lock the hashtable; a trick to do lockless
>in such a way that i do not lose that extra 4 bytes i would love to
>discuss here).

The approach Tim Mann and I did would work just fine...



>
>>
>>
>>>>Reason (2) implies that some sort of hashing should be done even in quiescence
>>>>search.  The problem with qsearch hashing nowadays is the memory caching
>>>>penalty.  Nowadays, if you have to wait on a memory access that might only help
>>>>a little, you might as well do the search directly.
>>>>
>>>>Maybe one might use a very small hash table (10K entries, say), reserved for
>>>>qsearch nodes?  (Or nodes beyond a certain depth or with reduced material.)
>>>>Particularly if you could freeze it all into the cache...
>>>>
>>>>I think an older ICCAJ paper detailed experiments with hash node type
>>>>allocation, but without reference to memory caching penalties.
>>>>
>>>>-- Walter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.