Computer Chess Club Archives




Subject: Re: Fixed

Author: Robert Hyatt

Date: 20:36:21 02/21/02

Go up one level in this thread

On February 21, 2002 at 21:14:50, Vincent Diepeveen wrote:

>On February 21, 2002 at 12:34:27, Robert Hyatt wrote:
>>On February 20, 2002 at 23:11:08, Angrim wrote:
>>>On February 20, 2002 at 19:28:42, Vincent Diepeveen wrote:
>>>>On February 20, 2002 at 16:10:01, Frank Phillips wrote:
>>>>>On February 20, 2002 at 12:48:06, Steve Maughan wrote:
>>>>>>Thanks for the help everyone!
>>>>>>I've implemented the idea I outlined and this has fixed the problem.  However
>>>>>>effectively I ignore possition where I cannot add a hash entry without
>>>>>>overwriting another.  This means that there is currently the possibility of a
>>>>>>draw not being detected.  I'll probably test a rehashing mechanism.
>>>>>Just stick the entry in the next available slot (that has a hash signature of
>>>>>zero).  The debug tests I did never showed more than a few re-hashes - once it
>>>>>was working properly.
>>>>>I changed to this method from a simple array list following Bruce's description,
>>>>>because it seemed so elegant with the possibility of speeding up detection in
>>>>>the endgame - when the list approach might have many entries to check.
>>>>>After falling for the obvious trap of not always removing an entry when exiting
>>>>>ABsearch() early, it seems to work fine.
>>>>>I noticed no significant speed loss (or speed up) in the short middle game test
>>>>>I use.
>>>>You measured pretty bad then depending upon hashtable size. An array
>>>>that gets read within 1 cache line and is already within L1 or L2
>>>>is kicking the hell out of this hashtable approach.
>>>Depends on the exact design. If the hash table is small enough then
>>>it will tend to be in L2 most of the time.  A 400 entry hash table
>>>is quite large enough to handle triple rep detection.
>>>Also, how are you fitting your array into a single cache line
>>>when you are searching 10+ ply deep?  It seems to me that
>>>you have to be storing at least 10, 8 byte hash entries in
>>>that array by the time you reach a leaf node.
>the hashtable trick works yes and is proven to be working.
>the array trick is way faster.
>usually every other move is a capture or pawnmove in 99.999% of
>all computerlines.
>So usually only a few ply must be searched.
>Right now a single cache line is 64 bytes. RDRAM is already 128 bytes
>and RDRAM is the future of course. So 8 ply in DDR ram or 16 ply in
>RDRAM is already a fact.

when did we go from 32 bytes to 64 bytes in cache on the Intel platform?
I missed that although I don't follow it extremely closely...

>Most important is however that these few tens of bytes are completely
>in the L2 for *sure* as you keep researching the same cache line.
>A hashtable in case of DIEP will be not in L2 of course. Diep is just
>too big to fit in 256KB L2. Not to mention some newer chips might
>be 128KB L2 and a bigger L3.
>So for the small array i usually lose like 5 clocks or so to L2
>cache. For the hashtable hashing i lose for near sureness about
>200 clocks for at least 1 lookup in hashtable.
>For me a hashtable is dead slow here therefore compared to
>the array alternative.
>In the far endgame i care shit actually whether i lose 20 clocks
>to seeing 10 moves. Note this still is faster than the 200 clocks
>you lose. The 'worst case' 40 move behaviour, is of course no
>issue, because the chance is like 0 then already that it isn't
>a draw, so the array approach loses clocks at the right time.

You don't necessarily lose 200 clocks...  the repetition hash table is
very small, and is called often enough that it should stick in L2 quite
well, which means no penalty and no loop either.

>Still seeing 40 moves with just 1 branch misprediction to
>the loop (fall through principle) is still way cheaper than
>the 200 clocks i might burn on something in DDR ram.
>On average the array idea will be around a factor 20 faster than
>than the hashtable idea, when taking into account how the hardware
>is now.

The list was slower for me.  I only changed back to the list because of SMP
issues I didn't like with the table.

>>The hash table trick does work well...  and it is particularly favorable when
>>you are in positions where you play a lot of moves with no pawn pushes or
>>captures so that the repetition list becomes lengthy.  The repetition detection
>>code is O(N) complexity, while the hash table detection is O(1).  If N is very
>>small, as it is in many positions, then the two are pretty equal in performance,
>>the O(1) of hashing is offset by 4 consecutive repetition signatures in a single
>>cache line.  But if N is large, then hashing is clearly faster.
>>I used this for a while and found that it worked just fine.  There is a cost,
>>in that when you enter a node, you have to store the hash signature, and when
>>you exit a node you have to remove or inactivate it.  So a little more work
>>on the back-side to clean it up as you back out of the search, less work on
>>the front as you don't have to search a long list for a match.  But for SMP
>>searching, I went back to the repetition list as each thread needs its own
>>unique repetition facility and copying a reasonable sized repetition table
>>is more expensive than just copying the appropriate part of the repetition
>>If it weren't for the parallel search, I would probably have stuck with the
>>hashing approach.  I believe Bruce used that at one point in time in Ferret.
>>I don't know whether he still does or not, however...
>>It isn't unreasonable.  It isn't really significant in a chess engine at all,
>>it is "just another way" to accomplish a common task...

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.