Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Data structures question for hash tables

Author: Don Dailey
Date: 07:51:52 01/23/99
On January 22, 1999 at 13:34:31, KarinsDad wrote:

>On January 22, 1999 at 13:05:51, Eugene Nalimov wrote:
>
>>On January 22, 1999 at 08:52:52, Robert Hyatt wrote:
>>
>>>On January 22, 1999 at 00:31:40, KarinsDad wrote:
>>>
>>>>On January 21, 1999 at 23:46:27, Robert Hyatt wrote:
>>>>
>>>>>On January 21, 1999 at 19:04:21, KarinsDad wrote:
>>>>>
>>>>>>On January 21, 1999 at 12:37:32, Heiner Marxen wrote:
>>>>>>
>>>>>>[snip]
>>>>>>
>>>>>>>
>>>>>>>Another method (which I happen to use) to avoid bucket overlaps
>>>>>>>is ``quadratic rehashing´´.  Instead of using a constant increment,
>>>>>>>we linearly increase the steps: inc=1,2,3,4,5,...
>>>>>>>
>>>>>>>XX.X..X...X....X
>>>>>>>.YY.Y..Y...Y....Y
>>>>>>>..ZZ.Z..Z...Z....Z
>>>>>>>
>>>>>>>Also, the slots of a bucket are still somewhat nearby, which is good
>>>>>>>for cache locality.
>>>>>>>
>>>>>>>Knuth seems to not mention this method, and I do not remember where
>>>>>>>I found it, so I cannot give a reference.
>>>>>>>
>>>>>>>Heiner
>>>>>>
>>>>>>I like this. I think I'll try this out. However, I think it needs a modification
>>>>>>due to the following possiblity:
>>>>>>
>>>>>>XX.X..X.........
>>>>>>YY.Y..Y...Y.....
>>>>>>ZZ.Z..Z...Z....Z
>>>>>>
>>>>>>Thanks Heiner :) :)
>>>>>>
>>>>>>KarinsDad
>>>>>
>>>>>
>>>>>Just note this will kill a PC.  Because it fetches 32 byte lines from memory in
>>>>>a burst cycle.  for best performance you should use all those 32 bytes since you
>>>>>had to wait on them.  If you do anything else, you will end up hitting one line
>>>>>per probe with a big cache miss penalty for each probe.
>>>>>
>>>>>This only works on architectures like the Cray where memory is organized a lot
>>>>>differently with no cache of any kind...  just high sustained thruput..
>>>>
>>>>Robert,
>>>>
>>>>Thanks for the input.
>>>>
>>>>I have been thinking about this method a little bit this evening and although I
>>>>came up with some good ideas, I keep pounding into the problem of 95% of the
>>>>hash is full, how can I continue to find open slots without overwriting the
>>>>nodes (for my program, this would currently be more preferable)?
>>>>
>>>>It may not be solvable this way, or at least not in a reasonable amount of time.
>>>>
>>>>On the other hand, each of the solutions in this thread seem to run into that
>>>>problem to some extent.
>>>>
>>>>KarinsDad :)
>>>
>>>You can't go very far here.  Because you probe at _every_ node in the tree.  If
>>>you make your probe 10x slower by looping, you will _really_ feel the slow-down,
>>>and it isn't clear that getting too cute is important, when you can buy a gig of
>>>memory for about $1,000 now...
>>
>>That depends. Not every chess program user can afford to buy 1Gb
>>of RAM. And must popular motherboards will not accept that much.
>>
>>Based on my experience, all depends on how fast you can fill your
>>hash table. As soon as it's *totally* full, more complex replacement
>>schema can be better, even if it causes some nps degradation.
>>
>>As long as program has a lot of RAM, and plays mostly blitz (or fast
>>enough games), simple schemas must be Ok. Let's take Crafty as
>>example (all numbers are very crude approximations): its' current
>>speed is ~1Mnps. As you don't cache q-nodes, it has to store 500K
>>positions per second. So, when time control is "move in 10 seconds",
>>you have to store max 5M positions. That's only 40Mb. I think that
>>serious degradation will start when you'll have to store 10 times
>>as much positions as you have hash entries. So, on your machine you
>>probably will be Ok even for games with standard time controls, if
>>you'll allocate 128Mb for hash.
>>
>>Average Crafty user has much smaller hash. From the other side,
>>his/her nps is much less, too...
>>
>>So for current machines you can be right... But if you'll try to
>>play 1-to-100 match again, or run your program on WinCE (reasonable
>>fast CPU and 2Mb of RAM), more complex hash management will be Ok.
>>
>>Eugene
>
>Yes, these are considerations that I had to make when designing my program. My
>theory is that it should run well on average systems from the last few years in
>standard tournament speeds (e.g. 40 in 120, rest in 60).
>
>My personal view is that although blitz chess is fine for a fun game, it's a
>game of who can swindle who the quickest. Blunders are the name of the game.
>It's next to worthless to play it against a computer, even if you crank the
>program down a lot (for most players).
>
>So, my initial design goals were to create a good sized hash as my "maximum"
>(such as 32MB), and then attempt an implementation that would "fit" within that
>maximum with a minimum of overwrites or replacements of nodes.
>
>If my program plays lousy chess against other programs in blitz, but plays
>strong chess in slower times, then I have still achieved one of my design goals.
>Regardless of how I program it, it should still play a fair blitz game against
>humans (i.e. me).
>
>KarinsDad

Hi KarinsDad,

I am wondering if you knew that even if you doubled your hash
table size you only get about 6 percent node count reduction.
But we are not talking about 6 percent, we are talking about
being paranoid about what you do when your table is 95 percent
full.  You are in effect seeking a solution (and apparently
willing to pay any cost) to make it possible to improve your
odds of finding a spare location in your table.  And even this
only addresses (no pun intended) a very narrow range of time
controls, anything more and your table is full, less and your
table is fine.   Really, you are talking about adding a lot of
baggage to your search to get perhaps a 1 or 2 percent node
count reduction.   But the baggage itself is counter productive,
linked list reduce the number of hash table entries you will
get with a give amount of memory,  and multiple probes will
more than likely eat enough processor time to kill any tiny
node reduction you get and probably any elaborate schemes you
can think of might be theoretically fun to work with
but useless.  Remember, you are fighting for almost nothing,
1 or 2 percent.   I think if I was going to use anything
more complicated than simple open addressed hash tables with
overwriting I would probably experiment with set associative
type of schemes.  I think Bob may have used this in Cray
Blitz, you'll have to ask him for the details.  The basic
idea (in case you do not already know this, forgive me if you do)
is that you partition your table entries into sets of n records
(try something like 4 or 8) and you confine your probes to these
entries.   You might just sequentially look at these n records for
a hit, or an empty location if you are writing.   But everything
I said applies here too, this at very most would be a minor
improvement if any at all.  I have this recollection that there
was something different about the Cray (as told to me by Bob)
that made this scheme nicely workable on a Cray.  I have experimented
with lots of stuff (just like you are proposing) and never came
up with anything better than KISS (keep it simple stupid.)

A lot more interesting and beneficial is knowing WHEN to overwrite
an entry and when not to.  This might gain you a lot because
some overwrite schemes can lose important information.  The idea
of using 2 hash tables is an example of such an idea.  I don't
know if you are using this idea, but it will gain you much more
than figuring out a way to find an empty table location on the
8th probe when your tables just happen to be about 95 percent full.

You must have done some time tests right?  What kind of data do
you have to report?  Or are you just in the design phase and
considering your options?


- Don
Re: Data structures question for hash tables KarinsDad 09:11:25 01/23/99
- Re: Data structures question for hash tables Don Dailey 14:02:06 01/23/99
  - Re: Data structures question for hash tables KarinsDad 15:14:34 01/23/99
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.