Author: Don Dailey
Date: 14:07:58 06/10/98
Go up one level in this thread
On June 08, 1998 at 23:17:25, Robert Hyatt wrote:
>I have been experimenting with the SMP search stuff in Crafty, as most
>of you know. Thought I would give a brief update of the current status,
>just for information.
>
>Before going to the "full Cray Blitz" type of search, I wanted to play
>around with the Young Brother's Wait (Feldman, et al) algorithm, just
>to see what it is capable of producing. It has been working pretty
>well, although I have found some interesting angles that need more
>study.
Hi Bob,
I am surprised to hear you are not using this algorithm and assumed
all good parallel programs used this. You have reported performance
numbers in the past that seemed correct. Perhaps your algorithm
actually
achieves the same effect through a different mechanism. I often
marvel at how often this happens in computer science.
I need to re-read your last article describing your parallelism in
the ICCA journal. When I first looked at it, it seemed unduly
complicated compared to what I'm doing, so I didn't try too hard
to understand it. Our stuff is very simple and gives the same
numbers you reported so I chalked it up to the "different roads all
lead to rome" phenomenon. But if you were not using the YBW
stuff I'm curious about how it works so well.
Our algorithm is simply this:
Do this recursively and at any node (except quies):
1. Search first move serially. (Take cutoff if possible)
2. Spawn all siblings of first move in parallel. If any
moves return >=beta, then abort work at this level.
Spawn is really "post work", rarely does the work actually
migrate to other processors (32 processors, millions of nodes
per second, not too many actually move to other processors)
We don't think or care about how many processors are there.
Our parallelism is based on work stealing, so a processor
randomly polls other processors when it runs out of work
and work will be grabbed from a work queue. The work is
grabbed of course oldest work first. There is no master
process, everthing is very clean and orderly and the code
looks almost identical to serial code. All the scheduling
stuff is done by the run time system (cilk is really c code
with minor extensions added.) Cilk is very fast and tight,
and was designed for computer chess although it's now used
for many things.
Sorry about the sales pitch! Anyway, I'm interested in
looking at your algorithm. Do you think you are essentially
doing the same thing I do or is your stuff fundamentally
different?
- Don
>First some results. This run on my ALR 4 processor machine, with the
>current version of Crafty:
>
> 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
> +------------------------------------------------------------
> 1 | 0 0 7 0 0 1 0 31 0 0 0 0 1 0 0
> 2 | 0 0 0 0 0 0 0 0 0 0 0 4 0 1 0
> 3 | 0 0 0 0 0 0 0 0 2 0 0 29 4 0 4
> 4 | 0 0 0 0 0 0 0 0 0 0 0 0 0 17 0
> 5 | 0 0 0 0 0 0 0 2 0 0 0 0 0 5 0
> 6 | 0 11 0 1 1 0 0 0 0 0 0 0 0 0 0
> 7 | 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
> 8 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 9 | 0 0 0 0 0 0 0 0 0 0 0 9 0 20 0
>10 | 0 0 0 0 0 0 0 0 0 1 0 -- 0 0 0
>11 | 0 0 0 0 5 0 1 0 0 0 0 0 0 0 3
>12 | 0 0 0 0 8 0 0 0 0 0 0 0 1 0 0
>13 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42
>14 | 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
>15 | 0 0 2 0 0 0 0 3 0 0 0 2 0 0 0
>16 | 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0
>17 | 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1
>18 | 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
>19 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>20 | 0 0 0 1 3 0 0 0 1 4 0 0 0 0 0
>incorrect: 230
>sum of times(squared)=19155
>
>the above table was produced by Crafty version 15.12, using all
>4 processors, with hash=24m, hashp=5m, and a time limit of 1 minute
>per move. the problem suite is broken down into groups of 20, with
>1-20 in column 1, 21-40 in column 2, etc... for the 300 problems in
>the Win At Chess suite.
>
>At the current time, Crafty still misses 1, position 230, which has
>only been solved by Cray Blitz and Deep Thought/Deep Blue, to the best
>of my knowledge. If you drop this to 30 seconds, crafty will also miss
>141 and 293. to summarize, I see the following correct results at
>various time limits:
>
>time correct
> 1 276
> 5 291
> 10 293
> 15 294
> 30 297
> 60 299
>
>Which looks pretty reasonable.
>
>Young Brother's Wait, if you haven't seen it, simply says to not search
>any moves at a ply (in parallel) until at least one move has been
>searched
>sequentially. The obvious idea is to avoid search overhead where we
>might
>search moves before we have a good alpha/beta bound. I've been playing
>with
>this and have found that a slight modification is currently working even
>better for me, namely to not search in parallel until *two* branches
>have
>been searched sequentially, which improves the odds that if two moves
>have
>been searched, then *all* moves will need to be searched. I have not
>yet
>tried 3, 4, ..., but am going to do so as this is easy and trivial to
>change.
>
>Still plenty to do and test, but performance is getting better. It
>really doesn't become efficient for most of the WAC suite since they are
>solved by 2-3-4-5 ply searches, which only take milliseconds... so far
>the best speedups are happening at longer time controls, but it is
>obviously
>effective even at game/1min type limits.
>
>More as new results and tests are available...
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.