Computer Chess Club Archives




Subject: Re: Fixed

Author: Vincent Diepeveen

Date: 17:18:35 02/22/02

Go up one level in this thread

On February 22, 2002 at 00:52:07, Angrim wrote:

>On February 21, 2002 at 21:14:50, Vincent Diepeveen wrote:
>>>On February 20, 2002 at 23:11:08, Angrim wrote:
>>>>On February 20, 2002 at 19:28:42, Vincent Diepeveen wrote:
>>>>>You measured pretty bad then depending upon hashtable size. An array
>>>>>that gets read within 1 cache line and is already within L1 or L2
>>>>>is kicking the hell out of this hashtable approach.
>>>>Depends on the exact design. If the hash table is small enough then
>>>>it will tend to be in L2 most of the time.  A 400 entry hash table
>>>>is quite large enough to handle triple rep detection.
>>>>Also, how are you fitting your array into a single cache line
>>>>when you are searching 10+ ply deep?  It seems to me that
>>>>you have to be storing at least 10, 8 byte hash entries in
>>>>that array by the time you reach a leaf node.
>>the hashtable trick works yes and is proven to be working.
>>the array trick is way faster.
>>usually every other move is a capture or pawnmove in 99.999% of
>>all computerlines.
>This seems like an exageration. maybe you meant 99% ?

well 99 or 99.9999 the idea is clear, because
one of the sides tries some stupid moves ( < alfa ) and
the other side can capture them.

the deeper you search in middlegame the bigger the chance
there are such stupid moves that captures are possible. also
most programs try captures first, so it is not very uncoincidental
that a capture is in the line usually.

add to that that all pawn moves also stop the search.

>Anyway, I guess your point was that you can stop looking for a match
>in the list once you reach a non-reversible move.  Certainly valid.
>>So usually only a few ply must be searched.
>>Right now a single cache line is 64 bytes. RDRAM is already 128 bytes
>>and RDRAM is the future of course. So 8 ply in DDR ram or 16 ply in
>>RDRAM is already a fact.
>This statement really surprised me, I thought that cache lines were still
>16 bytes.  So I went and looked it up online, and learned something :)

Wow too stubborn to even believe i remembered correctly... :)

>My Athlon seems to have a 64 byte cache line, the new P4 chips have
>a 128byte cache line.  I guess I should be aligning some of my more
>critical structures on 64 byte boundaries instead of 16 byte..
>>Most important is however that these few tens of bytes are completely
>>in the L2 for *sure* as you keep researching the same cache line.
>>A hashtable in case of DIEP will be not in L2 of course. Diep is just
>>too big to fit in 256KB L2. Not to mention some newer chips might
>>be 128KB L2 and a bigger L3.
>Well, in my specific case(medium size engine, Athlon chip) I have
>a fair bit of spare room in the L2 cache.

but trashing it with a huge hashtable, *just for repetition
lookups*, doesn't seem like a good idea to me.

>>So for the small array i usually lose like 5 clocks or so to L2
>>cache. For the hashtable hashing i lose for near sureness about
>>200 clocks for at least 1 lookup in hashtable.
>>For me a hashtable is dead slow here therefore compared to
>>the array alternative.
>>In the far endgame i care shit actually whether i lose 20 clocks
>>to seeing 10 moves. Note this still is faster than the 200 clocks
>>you lose. The 'worst case' 40 move behaviour, is of course no
>>issue, because the chance is like 0 then already that it isn't
>>a draw, so the array approach loses clocks at the right time.
>>Still seeing 40 moves with just 1 branch misprediction to
>>the loop (fall through principle) is still way cheaper than
>>the 200 clocks i might burn on something in DDR ram.
>>On average the array idea will be around a factor 20 faster than
>>than the hashtable idea, when taking into account how the hardware
>>is now.
>It still sounds wrong that a single memory lookup should be much
>slower than a series of memory lookups.  I wish there was a way to
>test this that was simpler than replacing my triple rep code
>completely and running a set of tests.

A lookup to the DDR ram is huge number of clocks. Up to 300 processor
clocks for the newer cpu's perhaps (> 200 for sure).

A lookup to L2 is way faster.

Making a hashtable of a small size is not so smart because to correctly
detect repetition you then need also to add a linked list datastructure
to it and an 'unhash'.


This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.