Author: Angrim
Date: 21:52:07 02/21/02
Go up one level in this thread
On February 21, 2002 at 21:14:50, Vincent Diepeveen wrote: >>On February 20, 2002 at 23:11:08, Angrim wrote: >>>On February 20, 2002 at 19:28:42, Vincent Diepeveen wrote: <snip> >>>>You measured pretty bad then depending upon hashtable size. An array >>>>that gets read within 1 cache line and is already within L1 or L2 >>>>is kicking the hell out of this hashtable approach. >>> >>>Depends on the exact design. If the hash table is small enough then >>>it will tend to be in L2 most of the time. A 400 entry hash table >>>is quite large enough to handle triple rep detection. >>>Also, how are you fitting your array into a single cache line >>>when you are searching 10+ ply deep? It seems to me that >>>you have to be storing at least 10, 8 byte hash entries in >>>that array by the time you reach a leaf node. >>> >>>Angrim > >the hashtable trick works yes and is proven to be working. > >the array trick is way faster. > >usually every other move is a capture or pawnmove in 99.999% of >all computerlines. This seems like an exageration. maybe you meant 99% ? Anyway, I guess your point was that you can stop looking for a match in the list once you reach a non-reversible move. Certainly valid. >So usually only a few ply must be searched. > >Right now a single cache line is 64 bytes. RDRAM is already 128 bytes >and RDRAM is the future of course. So 8 ply in DDR ram or 16 ply in >RDRAM is already a fact. This statement really surprised me, I thought that cache lines were still 16 bytes. So I went and looked it up online, and learned something :) My Athlon seems to have a 64 byte cache line, the new P4 chips have a 128byte cache line. I guess I should be aligning some of my more critical structures on 64 byte boundaries instead of 16 byte.. >Most important is however that these few tens of bytes are completely >in the L2 for *sure* as you keep researching the same cache line. > >A hashtable in case of DIEP will be not in L2 of course. Diep is just >too big to fit in 256KB L2. Not to mention some newer chips might >be 128KB L2 and a bigger L3. Well, in my specific case(medium size engine, Athlon chip) I have a fair bit of spare room in the L2 cache. >So for the small array i usually lose like 5 clocks or so to L2 >cache. For the hashtable hashing i lose for near sureness about >200 clocks for at least 1 lookup in hashtable. > >For me a hashtable is dead slow here therefore compared to >the array alternative. > >In the far endgame i care shit actually whether i lose 20 clocks >to seeing 10 moves. Note this still is faster than the 200 clocks >you lose. The 'worst case' 40 move behaviour, is of course no >issue, because the chance is like 0 then already that it isn't >a draw, so the array approach loses clocks at the right time. > >Still seeing 40 moves with just 1 branch misprediction to >the loop (fall through principle) is still way cheaper than >the 200 clocks i might burn on something in DDR ram. > >On average the array idea will be around a factor 20 faster than >than the hashtable idea, when taking into account how the hardware >is now. > It still sounds wrong that a single memory lookup should be much slower than a series of memory lookups. I wish there was a way to test this that was simpler than replacing my triple rep code completely and running a set of tests. Angrim
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.