Author: Robert Hyatt
Date: 16:32:16 06/10/98
Go up one level in this thread
On June 10, 1998 at 17:07:58, Don Dailey wrote: >On June 08, 1998 at 23:17:25, Robert Hyatt wrote: > >>I have been experimenting with the SMP search stuff in Crafty, as most >>of you know. Thought I would give a brief update of the current status, >>just for information. >> >>Before going to the "full Cray Blitz" type of search, I wanted to play >>around with the Young Brother's Wait (Feldman, et al) algorithm, just >>to see what it is capable of producing. It has been working pretty >>well, although I have found some interesting angles that need more >>study. > > >Hi Bob, > >I am surprised to hear you are not using this algorithm and assumed >all good parallel programs used this. You have reported performance >numbers in the past that seemed correct. Perhaps your algorithm >actually >achieves the same effect through a different mechanism. I often >marvel at how often this happens in computer science. > >I need to re-read your last article describing your parallelism in >the ICCA journal. When I first looked at it, it seemed unduly >complicated compared to what I'm doing, so I didn't try too hard >to understand it. Our stuff is very simple and gives the same >numbers you reported so I chalked it up to the "different roads all >lead to rome" phenomenon. But if you were not using the YBW >stuff I'm curious about how it works so well. > the main point in the CB approach was to predict *up front* whether a node was "ALL" or "CUT" and avoid searching CUT nodes, while searching an ALL node in parallel, even before the first has been searched sequentially. It worked well, but was unduly complicated... which is why I wanted to try YBW first. My first parallel search was PVS, which is a restricted form of YBW, so I had played with it before, but YBW has wait time, because you have to search the left-most PV nodes serially... and CB never had such a constraint. >Our algorithm is simply this: > >Do this recursively and at any node (except quies): > > 1. Search first move serially. (Take cutoff if possible) > > 2. Spawn all siblings of first move in parallel. If any > moves return >=beta, then abort work at this level. this is what I am doing, although I am now searching the first *two* moves serially before allowing a parallel search to happen. Search overhead (extra nodes) dropped way down, with only a minor increase in idle time in fast (blitz) searches, while there is not much at all in longer searches (in 30 sec/move games, I generally see 398% cpu utilization, for example...) I'm going to continue to investigate this... as the more moves that are searched before parallel search starts, the greater the probability that the work done in parallel won't be wasted, which is a current significant asymptote to performance.. > >Spawn is really "post work", rarely does the work actually >migrate to other processors (32 processors, millions of nodes >per second, not too many actually move to other processors) I don't even post... when one processor notices that another is idle, then that processor "invites" him to help at the current node, assuming two branches have already been searched.. simple, elegant, and so far, bug-free... > >We don't think or care about how many processors are there. >Our parallelism is based on work stealing, so a processor >randomly polls other processors when it runs out of work >and work will be grabbed from a work queue. The work is >grabbed of course oldest work first. There is no master >process, everthing is very clean and orderly and the code >looks almost identical to serial code. All the scheduling >stuff is done by the run time system (cilk is really c code >with minor extensions added.) Cilk is very fast and tight, >and was designed for computer chess although it's now used >for many things. I have no master process either. And no processor *ever* sits waiting for something to do, except for the length of time it takes to expand one node and then prepare for the "split".. > >Sorry about the sales pitch! Anyway, I'm interested in >looking at your algorithm. Do you think you are essentially >doing the same thing I do or is your stuff fundamentally >different? > probably quite similar. Only major problem is recursive alpha/beta means that the processor that enters a node must be the one that exits that same node and returns to the previous level... but if that processor runs out of work first, it also has to be able to help others without sitting idle. The data structure for that is quite cute... >- Don > > > >>First some results. This run on my ALR 4 processor machine, with the >>current version of Crafty: >> >> 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 >> +------------------------------------------------------------ >> 1 | 0 0 7 0 0 1 0 31 0 0 0 0 1 0 0 >> 2 | 0 0 0 0 0 0 0 0 0 0 0 4 0 1 0 >> 3 | 0 0 0 0 0 0 0 0 2 0 0 29 4 0 4 >> 4 | 0 0 0 0 0 0 0 0 0 0 0 0 0 17 0 >> 5 | 0 0 0 0 0 0 0 2 0 0 0 0 0 5 0 >> 6 | 0 11 0 1 1 0 0 0 0 0 0 0 0 0 0 >> 7 | 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 >> 8 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 9 | 0 0 0 0 0 0 0 0 0 0 0 9 0 20 0 >>10 | 0 0 0 0 0 0 0 0 0 1 0 -- 0 0 0 >>11 | 0 0 0 0 5 0 1 0 0 0 0 0 0 0 3 >>12 | 0 0 0 0 8 0 0 0 0 0 0 0 1 0 0 >>13 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42 >>14 | 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 >>15 | 0 0 2 0 0 0 0 3 0 0 0 2 0 0 0 >>16 | 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 >>17 | 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 >>18 | 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 >>19 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>20 | 0 0 0 1 3 0 0 0 1 4 0 0 0 0 0 >>incorrect: 230 >>sum of times(squared)=19155 >> >>the above table was produced by Crafty version 15.12, using all >>4 processors, with hash=24m, hashp=5m, and a time limit of 1 minute >>per move. the problem suite is broken down into groups of 20, with >>1-20 in column 1, 21-40 in column 2, etc... for the 300 problems in >>the Win At Chess suite. >> >>At the current time, Crafty still misses 1, position 230, which has >>only been solved by Cray Blitz and Deep Thought/Deep Blue, to the best >>of my knowledge. If you drop this to 30 seconds, crafty will also miss >>141 and 293. to summarize, I see the following correct results at >>various time limits: >> >>time correct >> 1 276 >> 5 291 >> 10 293 >> 15 294 >> 30 297 >> 60 299 >> >>Which looks pretty reasonable. >> >>Young Brother's Wait, if you haven't seen it, simply says to not search >>any moves at a ply (in parallel) until at least one move has been >>searched >>sequentially. The obvious idea is to avoid search overhead where we >>might >>search moves before we have a good alpha/beta bound. I've been playing >>with >>this and have found that a slight modification is currently working even >>better for me, namely to not search in parallel until *two* branches >>have >>been searched sequentially, which improves the odds that if two moves >>have >>been searched, then *all* moves will need to be searched. I have not >>yet >>tried 3, 4, ..., but am going to do so as this is easy and trivial to >>change. >> >>Still plenty to do and test, but performance is getting better. It >>really doesn't become efficient for most of the WAC suite since they are >>solved by 2-3-4-5 ply searches, which only take milliseconds... so far >>the best speedups are happening at longer time controls, but it is >>obviously >>effective even at game/1min type limits. >> >>More as new results and tests are available...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.