Computer Chess Club Archives

Search

Terms
Messages

Subject: DB Presentation (sorry about format)

Author: Stuart Cracraft
Date: 17:55:45 12/28/05
Deep Blue
Artificial Intelligence 134 (2002) 57
Murray Campbell , A. Joseph Hoane Jr. , Feng-hsiung Hsu

Page 2
Outline
Introduction
System overview
The chess chip
Software search
Hardware search
Parallel search
Evaluation function
Miscellaneous
Conclusion
References

Page 3
Abstract
Deep Blue is the chess machine that defeated then-
reigning World Chess Champion Garry Kasparov in
a six-game match in 1997.
Factors that contributed to this success:
a single-chip chess search engine
a massively parallel system with multiple levels of
parallelism
a strong emphasis on search extensions
a complex evaluation function
effective use of a Grandmaster game database

Page 4
1. Introduction
The Deep Blue computer chess system,
developed at IBM Research during the mid-
1990s.
There are two distinct versions of Deep Blue,
one which lost to Garry Kasparov in 1996 and
the one which defeated him in 1997.
1996 version:Deep Blue I
1997 version:Deep Blue II
We refer primarily to the 1997 version, and
comparisons with the 1996 version.

Page 5
Introduction(cont.)
1.1. ChipTest and Deep Thought
These systems used a single-chip chess move
generator [12] to achieve search speeds in the
neighborhood of 500,000 positions per second
(ChipTest) to 700,000 positions per second (Deep
Thought).
Deep Thought is described in detail in [16,17].

Page 6
Introduction(cont.)
1.2. Deep Thought 2
Also known as Deep Blue prototype
Use the same move-generator chip as Deep Thought,but it had
four improvements:
Medium-scale multiprocessing.
Have 24 chess engines
Enhanced evaluation hardware.
Evaluation hardware used larger RAMs and was able to include a few
additional features in the evaluation function.
Improved search software.
Search software was designed to deal better with the parallel search, as
well as implement a number of new search extension ideas.
Extended book [6].
The extended book allowed Deep Thought 2 to make reasonable opening
moves even in the absence of an opening book.

Page 7
Introduction(cont.)
1.3. Deep Blue I
Deep Blue I was based on a single-chip chess
search engine
Deep Blue I ran on a 36-node IBM RS/6000
SPTM computer, and used 216 chess chips.
The chips each searched about 1.62 million
chess positions per second.
Overall search speed was 50100 million chess
positions per second.

Page 8
Introduction(cont.)
1.4. Deep Blue II
First change, chess chip was designed
Redesign evaluation function, from 6400 features to over
8000.
Add hardware repetition detection, a number of specialized
move generation modes
Increase the per chip search speed to 22.5 million
positions per second.
Second change
More than double the number of chess chips in the system
Use the newer generation of SP computer to support the
higher processing

Page 9
Introduction(cont.)
Third change
Development of a set of software tools to aid in
debugging and match preparation, eg., evaluation tuning
and visualization tools.
Finally
Conclude the searchingability of Deep Blue was
acceptable
Spent the vast majority of our time between the two
matches designing, testing, and tuning the new
evaluation function

Page 10
2. System overview
Deep Blue is a massively parallel system designed
for carrying out chess game tree searches.
Compose of a 30-node (30-processor) IBMRS/6000 SP
computer and 480 single-chip chess search engines, with
16 chess chips per SP processor.
28 nodes with 120 MHz P2SC processors, and 2 nodes
with 135 MHz P2SC processors
All nodes have 1 GB of RAM, and 4 GB of disk.
The chess chips in Deep Blue are each capable of
searching 2 to 2.5 million chess positions per second, and
communicate with their host node via a microchannel bus.

Page 11
System overview(cont.)
Deep Blue is organized in three layers.
One of the SP processors is designated as the
master, and the remainder as workers.
Master:searches the top levels of the chess game tree,
and then distributes “leaf” positions to the workers for
further examination
Worker:carry out a few levels of additional search, and
then distribute their leaf positions to the chess chips
Chess chip:search the last few levels of the tree

Page 12
System overview(cont.)
Overall system speed varied widely
1996 version:
Average about 100 million positions per second
For quieter positions, speeds close to 200 million
positions per second were typical.
1997 version:
Average system speed observed in searches longer
than one minute was 126 million positions per second.
The maximum sustained speed observed in this match
was 330 million positions per second.

Page 13
System overview(cont.)
Challenges
Large searching capacity (Section 4 and 5)
The search should be highly non-uniform.
The search should provide “insurance” against simple errors.
Hardware evaluation (Section 7)
evaluation function is implemented in hardware.
time to execute the evaluation function is a fixed constant.
Massively parallel search (Section 6)
over 500 processors available to participate in the game tree
search

Page 14
System overview(cont.)
Hybrid software/hardware search
The Deep Blue search combines a software search, with a hardware search.
The software search is extremely flexible, and can be changed as needed.
The hardware search is parameterized, but the general form of the search is
fixed.

Page 15
3. The chess chip
The chess chip divides into three parts: the move
generator, the evaluation function, and the search
control.
3.1. Move generation
The move generator is implemented as an 8 × 8 array of
combinatorial logic, which is effectively a silicon
chessboard.
A hardwired finite state machine controls move generation.
The move generator, although it generates the only one
move at a time, implicitly computes all possible moves and
selects one via an arbitration network.
Computing all the moves simultaneously is one way to get
minimum latency while generating moves in a reasonable
order.

Page 16
The chess chip(cont.)
3.2. Evaluation function
The evaluation function is composed of a “fast evaluation”
and a “slow evaluation” [7].
This is a standard technique to skip computing an expensive
full evaluation when an approximation is good enough.
Fast evaluation:computes a score for a chess position in a
single clock cycle, contains all the easily computed major
evaluation terms with high values.
Slow evaluation:scan the chessboard one column at a time,
computing values for chess concepts. Eg., king safety, pawn
structure, passed pawns,……
The features recognized in both the slow and fast
evaluation functions have programmable weights, allowing
their relative importance to be easily adjusted.

Page 17
The chess chip(cont.)
3.3. Search control
The search control portion of the chip uses a number of
state machines to implement null-window alpha-beta
search.
Advantage of null-window search
Eliminates the need for a value stack
Simplify the hardware design
Disadvantage of null-window search
some cases it is necessary to do multiple searches
Limitation on the hardware search
Lack of a transposition table
Sol. Search in software
move stack

Page 18
The chess chip(cont.)
3.4. Extendability
Use an external FPGA to provide :
Access to an external transposition table
More complicated search control
Additional terms for the evaluation function
Null move search was also explicitly supported by
this method

Page 19
4. Software search
Based on the experiences with Deep Thought 1, a
new selective search was built for Deep Thought 2
Called “dual credit with delayed extensions”
Design principle
Extend forcing/forced pairs (ffp’s) of moves
Forced moves are expectation dependent
Fractional extensions
Delayed extensions
Dual credit
Preserve the search envelope

Page 20
Software search(cont.)
4.1. Credit generation mechanisms
ffp’s accumulate “credit”, and only when sufficient credit is available
can it be “cashed in” for an extension.
There is a large set of mechanisms to identify nodes that should
receive credit.
Singular, binary, trinary, etc.
Absolute singular
When there is only one legal move a large credit can be given with very little
risk.
Threat, mate threat.
It is relatively simple using a null move search to detect if there is a threat
in the
current position
If a large threat exists, such as a threat to checkmate, a higher level of
credit
can be given.
Influence.
This mechanism gives credit for moves which are enabled by previous moves.
Domain dependent.

Page 21
Software search(cont.)
4.2. Sample behavior
For a given iteration i, the
software is assigned i -4 ply,
which represents the
minimum depth search in
software.
The maximum software
depth is approximately
three times the minimum
depth.
The last column estimates
the maximum depth
reached in hardware and
software combined.
the estimate is based on
results of simulating the
hardware search

Page 22
Software search(cont.)
4.3. Miscellaneous
The Deep Blue scores are composed of two 16-bit signed
integers
regular search score and tie breaker score
For a draw
regular search score is zero
tie breaker contains either the static evaluation of a
theoretically drawn position or the count of moves until
repetition
“No progress” (a pruning mechanism) is implemented by
detecting if the current position could have been reached
by playing an alternate move at some earlier position on
the search path.

Page 23
5. Hardware search
The hardware search takes place on the chess chip
A chess chip carries out a fixed-depth null-window search,
which includes a quiescence search.
Fast and simple
We limit the chess chips to carry out only shallow searches.
The host processor controlling that chip is free to do other
work
The host can carry out a binary search, initiating a series of
null-window searches to determine the value (some case,
an exact value within a range is needed).

Page 24
Hardware search(cont.)
The main parameters of the hardware search
Depth of search
Depth of offset searches
Endgame rules assertions off or on
Endgame ROM assertions off or on
Number of “mating” checks allowed for each side in the quiescence search
Number of singular checking moves allowed in the quiescence search
Flags to enable testing of singular, binary, or trinary conditions at the search
root
Flag to ignore stalemate at one ply above quiescence
Flag to allow a one-ply extension in the quiescence search after a pawn
moves to the 7th rank or, in some cases, pawn moves to the 6th rank
Flag to allow a one-ply extension in the quiescence search when the side to
move has multiple hung pieces or a piece that is pinned and hung
Flag to allow a one-ply extension in the quiescence search when the side to
move has one hung piece for consecutive plies
Flag to allow a one-ply extension in the quiescence search if opponent has
any hung pieces

Page 25
6. Parallel search
Deep Blue is composed of a 30-node
RS/6000 SP computer and 480 chess chips
The SP nodes communicate with each other
using the MPI standard [10]
The chess chips communicate with their host
node via a Micro Channel bus
Heterogeneous architecture has a strong
influence on the parallel search algorithm
used in Deep Blue

Page 26
Parallel search(cont.)
6.1. Parallel search algorithm
Processor hierarchy
A static processor tree, with one SP node controlling the
other 29 nodes, which in turn control 16 chess chips each
The chess chips can only communicate directly with their
host node
Control distribution
Centralized control of the parallel search
The control is managed on the SP nodes

Page 27
Parallel search(cont.)
Parallelism possible
Type 1 (PV) nodes
After the first move has been examined at a PV node, all the
alternatives may be examined in parallel
Good type 2 nodes
nodes where the first move “fails high”, or exceeds expectations
Bad type 2 nodes
nodes where the fail high move is not searched first
Type 3 nodes
nodes where all the moves fail low
Synchronization
Type 1 and type 2 nodes are synchronization points for Deep
Blue
The first move must be evaluated before parallelism is allowed

Page 28
Parallel search(cont.)
6.2. Parallel search implementation
Load balancing
The search extensions algorithm leads to widely varying tree
sizes for a given search depth
This extends all the way to the hardware lead to severe load
balancing problems
Sol. abort long-running hardware searches and push more of
the search into software
Master overload
Ensure that the workers always had a job “on deck”, ready to
execute when it completes its active job
Sharing between nodes
Workers will generally pass their “expensive” transposition
table results up to the master

Page 29
Parallel search(cont.)
6.3. Parallel search performance
It is difficult to assess the efficiency of the full 30-
node Deep Blue system.
Indirect evidence suggests an overall observed
efficiency of about 8% in tactical positions and
about12% in quiet positions.

Page 30
7. Evaluation function
7.1. Overview
Evaluation function is essentially a sum of feature
values
Static values are set once at the beginning of a search
Dynamic values are also initialized at the beginning of a
search, but during the search they are scaled
The initialization is done by the “evaluation function
generator” (sub-program run on the master node)
Evaluation function generator is run only at the root of
the search tree
Evaluation function generator adjust feature values
based on the context of the root position

Page 31
Evaluation function(cont.)
There are 54 registers and 8096 table entries
for a total of 8150parameters that can be set
in the Deep Blue evaluation function.
??????A
Page 32
(nothing)

Page 33

Evaluation function(cont.)
7.2. Automated evaluation function analysis
The first tool had the goal of identifying features in
the Deep Blue I evaluation function that were
“noisy”
relatively insensitive to the particular weights chosen
The second tool was developed with the goal of
tuning evaluation function weights

Page 34
8. Miscellaneous
8.1. Opening book
The opening book was created by hand
The book consisted of about 4000 positions, and
every position had been checked by Deep Blue in
overnight runs
The openings were chosen to emphasize
positions that Deep Blue played well

Page 35
Miscellaneous(cont.)
8.2. Extended book
The extended book [6] is a mechanism that allows
a large Grandmaster game database to influence
and direct Deep Blue’s play in the absence of
opening book information
summarize the information available at each
position and use the summary information to
nudge Deep Blue in the consensus direction of
chess opening theory

Page 36
Miscellaneous(cont.)
Factors go into the extended book evaluation function
The number of times a move has been played
Relative number of times a move has been played
Strength of the players that play the moves.
Recentness of the move
Results of the move
Commentary on the move
Game moves versus commentary moves
game moves are considered more reliable than commentary moves, and
are thus likely to be better
Developed an ad hoc function that combined these factors in a
nonlinear way to produce a scalar value as output

Page 37
Miscellaneous(cont.)
8.3. Endgame databases
The endgame databases includes all chess positions with
five or fewer pieces on the board, and selected positions
with six pieces that included a pair of blocked pawns
Endgames databases were used both off-line and on-line
The off-line usage was during the design of the chess chips
Each chip had a ROM which stored patterns to help evaluate
certain frequently occurring chess endgames
The software search used the databases in on-line mode
Each of the 30 general purpose processors replicated the 4-piece
and important 5-piece databases on their local disk
Endgames were stored in the databases with one bit per
position (indicating lose or does-not-lose)

Page 38
Miscellaneous(cont.)
8.4. Time control
Chess games typically have a set of requirements
on the speed of play, termed the “time control”
Before each search, two time targets are set
normal time target
set to be approximately the time remaining to the next time
control divided by the moves remaining
panic time target
roughly one third of the remaining time

Page 39
Miscellaneous(cont.)
at the normal time target, a time-out occurs and
the current best move is played
The conditions go beyond its normal target into
“panic time”
The current best move has dropped 15 points or more
from the score of the previous iteration
The best move from the previous iteration is in a
potential “fail-low” state
A new move is in a potential “fail-high” state

Page 40
9. Conclusion
There were many areas where improvements
could have been made
Parallel search efficiency
The hardware search and evaluation could have
been made more efficient and flexible
Evaluation function tuning, both automatic and
manual, was far from complete
The addition of pruning mechanisms to Deep Blue
might have significantly improved the search

Page 41
Summary of Deep Blue and predecessors
(not included)
PDF Gian-Carlo Pascutto 01:39:39 12/29/05
Re: DB Presentation (sorry about format) Keith Evans 22:33:28 12/28/05
- Re: DB Presentation (sorry about format) Stuart Cracraft 10:30:43 12/29/05
- Re: DB Presentation (sorry about format) Walter Faxon 00:26:03 12/29/05
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.