Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: new questions (scott gasch!)

Author: Anthony Cozzie
Date: 12:50:23 11/19/04
On November 19, 2004 at 15:34:32, Scott Gasch wrote:

>On November 19, 2004 at 15:06:27, Anthony Cozzie wrote:
>
>>On November 19, 2004 at 14:34:59, Scott Gasch wrote:
>>
>>>On November 19, 2004 at 11:28:34, Anthony Cozzie wrote:
>>>
>>>>Well, I also did a pretty complicated algorithm, and I designed my own.  If you
>>>>borrowed someone else's things might go easier for you, although I think that
>>>>takes a lot of the fun out of it. It is also possible you are simply more
>>>>talented than I am at this.   However, I wouldn't say that it works until you
>>>>have run it at a dual for a week or so with no problems.  I *thought* I had mine
>>>>working after about two weeks of implementation :)  I think maybe now I'm down
>>>>to only 1 bug that seems to only occur on ICC.
>>>>
>>>>The next task for you is to measure the speedup you get.  BTW, I do have to take
>>>>back my comment about ABDADA.  After talking with some people, it seems like
>>>>ABDADA gives a _really_ poor speedup (<1.3 or so, which is barely worth doing).
>>>>Of course, Scott's algorithm probably works better.  Scott, what kind of speedup
>>>>do you get?
>>>>
>>>>anthony
>>>
>>>Well the way I did it was to make the engine able to search in parallel and then
>>>pick a likely all-node to split the tree.  I do not split at the root and I do
>>>not split when there is not enough depth left in the tree to justify it.  Other
>>>than that anytime I'm on a PV node and have searched the first move and it
>>>failed low or anytime I'm on a non-PV node and the first N moves fail low (N for
>>>me = 3 but this is obviously adjustable) I split... assuming there is a split
>>>node available and an idle thread to help search.  This works pretty well, I'd
>>>say on average about 5-10% of my split nodes end up failing high.
>>>
>>>I've only tested on my dual proc 1.5ghz Athlon but I see a very good speedup
>>>most of the time... When I was measuring and coding this I was seeing 1.8x to
>>>1.9x on a dual proc.  My initial speedups were more on the order of about 1.4x
>>>or 1.5x but this was due to hash table contention and pawn hash contention (see
>>>below).  One of these days I will borrow a bigger machine at work and benchmark
>>>there.  If I do that I'll post results here.
>>>
>>>I have heard of some people not running into contention in the hash table when
>>>going multi-threaded.  For example one friend of mine did a MT search and ended
>>>up just locking the whole table when one thread was accessing it... and not
>>>seeign any slow down.  This was not my experience at all and so I end up locking
>>>the hash table in 1/64th of the hash table at a time.  As I scaled up the number
>>>of CPUs searching I suspect this would have to become 1/128th or 1/256th.  Also
>>>I have a multi-probe algorithm in the pawn hash table that I suspect would not
>>>scale out very well with N CPUs.  One solution to this I think would be to have
>>>a separate pawn hash table in each thread.
>>>
>>>Scott
>>>Scott
>>
>>So something fairly similar to Crafty?  Do you have a recursive or iterative
>>search?  Do you require the thread that initiated the split to return, or can
>>any thread return from a splitpoint?  How does it synchronize after aborts?
>>
>>anthony
>
>I don't know how Crafty works.  I know Bob was kind enough to send me a paper
>about DTS once, though... so I kinda assumed that was how crafty worked.
>
>Search is recursive.  Yes, the thread that initiated the split must be the last
>out of the split.  The way I did aborts is that every thread searching under a
>split has a pointer in its "thread context" to the split node it is working
>under.  When one thread fails high it toggles a flag in the split node.  Every N
>nodes all thread check to see if the move timer has expired... and if they are
>under a split they also see if their split has been aborted (i.e. some other
>thread failed high).  In either case the search just unrolls.
>
>Because the thread that initiated the split has to be the last out I double
>reference the split with that thread.  Before it returns to the main search
>frame that called into SearchInParallel that thread spins waiting on the split
>to become unreferenced by helper threads.  Because of this we want to limit the
>number of aborts and limit the amount of time between when one thread realizes
>it has failed high and when the rest of the helpers unroll.
>
>I suppose this could be optimized but with my dual proc it seems to work fine.
>
>Scott

OK, this sounds very very similar to the Crafty algorithm, which gives a pretty
reasonable speedup.  It is worth noting, however, that Crafty does not use DTS.
I hope you'll forgive me for not talking more about Zappa's parallel algorithm,
but I spent quite some time on it, and it requires an iterative search, so most
people wouldn't want to implement it anyway.

anthony
Re: new questions (scott gasch!) Scott Gasch 13:09:44 11/19/04
- Re: new questions (scott gasch!) Robert Hyatt 14:07:15 11/19/04
  - Re: new questions (scott gasch!) Scott Gasch 15:10:38 11/19/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.