Author: Anthony Cozzie
Date: 12:50:23 11/19/04
Go up one level in this thread
On November 19, 2004 at 15:34:32, Scott Gasch wrote: >On November 19, 2004 at 15:06:27, Anthony Cozzie wrote: > >>On November 19, 2004 at 14:34:59, Scott Gasch wrote: >> >>>On November 19, 2004 at 11:28:34, Anthony Cozzie wrote: >>> >>>>Well, I also did a pretty complicated algorithm, and I designed my own. If you >>>>borrowed someone else's things might go easier for you, although I think that >>>>takes a lot of the fun out of it. It is also possible you are simply more >>>>talented than I am at this. However, I wouldn't say that it works until you >>>>have run it at a dual for a week or so with no problems. I *thought* I had mine >>>>working after about two weeks of implementation :) I think maybe now I'm down >>>>to only 1 bug that seems to only occur on ICC. >>>> >>>>The next task for you is to measure the speedup you get. BTW, I do have to take >>>>back my comment about ABDADA. After talking with some people, it seems like >>>>ABDADA gives a _really_ poor speedup (<1.3 or so, which is barely worth doing). >>>>Of course, Scott's algorithm probably works better. Scott, what kind of speedup >>>>do you get? >>>> >>>>anthony >>> >>>Well the way I did it was to make the engine able to search in parallel and then >>>pick a likely all-node to split the tree. I do not split at the root and I do >>>not split when there is not enough depth left in the tree to justify it. Other >>>than that anytime I'm on a PV node and have searched the first move and it >>>failed low or anytime I'm on a non-PV node and the first N moves fail low (N for >>>me = 3 but this is obviously adjustable) I split... assuming there is a split >>>node available and an idle thread to help search. This works pretty well, I'd >>>say on average about 5-10% of my split nodes end up failing high. >>> >>>I've only tested on my dual proc 1.5ghz Athlon but I see a very good speedup >>>most of the time... When I was measuring and coding this I was seeing 1.8x to >>>1.9x on a dual proc. My initial speedups were more on the order of about 1.4x >>>or 1.5x but this was due to hash table contention and pawn hash contention (see >>>below). One of these days I will borrow a bigger machine at work and benchmark >>>there. If I do that I'll post results here. >>> >>>I have heard of some people not running into contention in the hash table when >>>going multi-threaded. For example one friend of mine did a MT search and ended >>>up just locking the whole table when one thread was accessing it... and not >>>seeign any slow down. This was not my experience at all and so I end up locking >>>the hash table in 1/64th of the hash table at a time. As I scaled up the number >>>of CPUs searching I suspect this would have to become 1/128th or 1/256th. Also >>>I have a multi-probe algorithm in the pawn hash table that I suspect would not >>>scale out very well with N CPUs. One solution to this I think would be to have >>>a separate pawn hash table in each thread. >>> >>>Scott >>>Scott >> >>So something fairly similar to Crafty? Do you have a recursive or iterative >>search? Do you require the thread that initiated the split to return, or can >>any thread return from a splitpoint? How does it synchronize after aborts? >> >>anthony > >I don't know how Crafty works. I know Bob was kind enough to send me a paper >about DTS once, though... so I kinda assumed that was how crafty worked. > >Search is recursive. Yes, the thread that initiated the split must be the last >out of the split. The way I did aborts is that every thread searching under a >split has a pointer in its "thread context" to the split node it is working >under. When one thread fails high it toggles a flag in the split node. Every N >nodes all thread check to see if the move timer has expired... and if they are >under a split they also see if their split has been aborted (i.e. some other >thread failed high). In either case the search just unrolls. > >Because the thread that initiated the split has to be the last out I double >reference the split with that thread. Before it returns to the main search >frame that called into SearchInParallel that thread spins waiting on the split >to become unreferenced by helper threads. Because of this we want to limit the >number of aborts and limit the amount of time between when one thread realizes >it has failed high and when the rest of the helpers unroll. > >I suppose this could be optimized but with my dual proc it seems to work fine. > >Scott OK, this sounds very very similar to the Crafty algorithm, which gives a pretty reasonable speedup. It is worth noting, however, that Crafty does not use DTS. I hope you'll forgive me for not talking more about Zappa's parallel algorithm, but I spent quite some time on it, and it requires an iterative search, so most people wouldn't want to implement it anyway. anthony
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.