Computer Chess Club Archives




Subject: Re: Fixed

Author: Angrim

Date: 21:52:07 02/21/02

Go up one level in this thread

On February 21, 2002 at 21:14:50, Vincent Diepeveen wrote:
>>On February 20, 2002 at 23:11:08, Angrim wrote:
>>>On February 20, 2002 at 19:28:42, Vincent Diepeveen wrote:
>>>>You measured pretty bad then depending upon hashtable size. An array
>>>>that gets read within 1 cache line and is already within L1 or L2
>>>>is kicking the hell out of this hashtable approach.
>>>Depends on the exact design. If the hash table is small enough then
>>>it will tend to be in L2 most of the time.  A 400 entry hash table
>>>is quite large enough to handle triple rep detection.
>>>Also, how are you fitting your array into a single cache line
>>>when you are searching 10+ ply deep?  It seems to me that
>>>you have to be storing at least 10, 8 byte hash entries in
>>>that array by the time you reach a leaf node.
>the hashtable trick works yes and is proven to be working.
>the array trick is way faster.
>usually every other move is a capture or pawnmove in 99.999% of
>all computerlines.

This seems like an exageration. maybe you meant 99% ?
Anyway, I guess your point was that you can stop looking for a match
in the list once you reach a non-reversible move.  Certainly valid.

>So usually only a few ply must be searched.
>Right now a single cache line is 64 bytes. RDRAM is already 128 bytes
>and RDRAM is the future of course. So 8 ply in DDR ram or 16 ply in
>RDRAM is already a fact.

This statement really surprised me, I thought that cache lines were still
16 bytes.  So I went and looked it up online, and learned something :)
My Athlon seems to have a 64 byte cache line, the new P4 chips have
a 128byte cache line.  I guess I should be aligning some of my more
critical structures on 64 byte boundaries instead of 16 byte..

>Most important is however that these few tens of bytes are completely
>in the L2 for *sure* as you keep researching the same cache line.
>A hashtable in case of DIEP will be not in L2 of course. Diep is just
>too big to fit in 256KB L2. Not to mention some newer chips might
>be 128KB L2 and a bigger L3.

Well, in my specific case(medium size engine, Athlon chip) I have
a fair bit of spare room in the L2 cache.

>So for the small array i usually lose like 5 clocks or so to L2
>cache. For the hashtable hashing i lose for near sureness about
>200 clocks for at least 1 lookup in hashtable.
>For me a hashtable is dead slow here therefore compared to
>the array alternative.
>In the far endgame i care shit actually whether i lose 20 clocks
>to seeing 10 moves. Note this still is faster than the 200 clocks
>you lose. The 'worst case' 40 move behaviour, is of course no
>issue, because the chance is like 0 then already that it isn't
>a draw, so the array approach loses clocks at the right time.
>Still seeing 40 moves with just 1 branch misprediction to
>the loop (fall through principle) is still way cheaper than
>the 200 clocks i might burn on something in DDR ram.
>On average the array idea will be around a factor 20 faster than
>than the hashtable idea, when taking into account how the hardware
>is now.

It still sounds wrong that a single memory lookup should be much
slower than a series of memory lookups.  I wish there was a way to
test this that was simpler than replacing my triple rep code
completely and running a set of tests.


This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.