Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: new questions (scott gasch!)

Author: Scott Gasch

Date: 12:34:32 11/19/04

Go up one level in this thread


On November 19, 2004 at 15:06:27, Anthony Cozzie wrote:

>On November 19, 2004 at 14:34:59, Scott Gasch wrote:
>
>>On November 19, 2004 at 11:28:34, Anthony Cozzie wrote:
>>
>>>Well, I also did a pretty complicated algorithm, and I designed my own.  If you
>>>borrowed someone else's things might go easier for you, although I think that
>>>takes a lot of the fun out of it. It is also possible you are simply more
>>>talented than I am at this.   However, I wouldn't say that it works until you
>>>have run it at a dual for a week or so with no problems.  I *thought* I had mine
>>>working after about two weeks of implementation :)  I think maybe now I'm down
>>>to only 1 bug that seems to only occur on ICC.
>>>
>>>The next task for you is to measure the speedup you get.  BTW, I do have to take
>>>back my comment about ABDADA.  After talking with some people, it seems like
>>>ABDADA gives a _really_ poor speedup (<1.3 or so, which is barely worth doing).
>>>Of course, Scott's algorithm probably works better.  Scott, what kind of speedup
>>>do you get?
>>>
>>>anthony
>>
>>Well the way I did it was to make the engine able to search in parallel and then
>>pick a likely all-node to split the tree.  I do not split at the root and I do
>>not split when there is not enough depth left in the tree to justify it.  Other
>>than that anytime I'm on a PV node and have searched the first move and it
>>failed low or anytime I'm on a non-PV node and the first N moves fail low (N for
>>me = 3 but this is obviously adjustable) I split... assuming there is a split
>>node available and an idle thread to help search.  This works pretty well, I'd
>>say on average about 5-10% of my split nodes end up failing high.
>>
>>I've only tested on my dual proc 1.5ghz Athlon but I see a very good speedup
>>most of the time... When I was measuring and coding this I was seeing 1.8x to
>>1.9x on a dual proc.  My initial speedups were more on the order of about 1.4x
>>or 1.5x but this was due to hash table contention and pawn hash contention (see
>>below).  One of these days I will borrow a bigger machine at work and benchmark
>>there.  If I do that I'll post results here.
>>
>>I have heard of some people not running into contention in the hash table when
>>going multi-threaded.  For example one friend of mine did a MT search and ended
>>up just locking the whole table when one thread was accessing it... and not
>>seeign any slow down.  This was not my experience at all and so I end up locking
>>the hash table in 1/64th of the hash table at a time.  As I scaled up the number
>>of CPUs searching I suspect this would have to become 1/128th or 1/256th.  Also
>>I have a multi-probe algorithm in the pawn hash table that I suspect would not
>>scale out very well with N CPUs.  One solution to this I think would be to have
>>a separate pawn hash table in each thread.
>>
>>Scott
>>Scott
>
>So something fairly similar to Crafty?  Do you have a recursive or iterative
>search?  Do you require the thread that initiated the split to return, or can
>any thread return from a splitpoint?  How does it synchronize after aborts?
>
>anthony

I don't know how Crafty works.  I know Bob was kind enough to send me a paper
about DTS once, though... so I kinda assumed that was how crafty worked.

Search is recursive.  Yes, the thread that initiated the split must be the last
out of the split.  The way I did aborts is that every thread searching under a
split has a pointer in its "thread context" to the split node it is working
under.  When one thread fails high it toggles a flag in the split node.  Every N
nodes all thread check to see if the move timer has expired... and if they are
under a split they also see if their split has been aborted (i.e. some other
thread failed high).  In either case the search just unrolls.

Because the thread that initiated the split has to be the last out I double
reference the split with that thread.  Before it returns to the main search
frame that called into SearchInParallel that thread spins waiting on the split
to become unreferenced by helper threads.  Because of this we want to limit the
number of aborts and limit the amount of time between when one thread realizes
it has failed high and when the rest of the helpers unroll.

I suppose this could be optimized but with my dual proc it seems to work fine.

Scott



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.