Computer Chess Club Archives


Search

Terms

Messages

Subject: How PC Chess Programs Are Developed

Author: Dana Turnmire

Date: 22:53:52 05/08/01


  Here is an interesting article from CCR Vol. 3 No. 2 (Late 1992/Early 1993) by
Larry Kaufman.  I thought it was interesting what he had to say about "default
settings."

  "Don Daily (professional programmer and 1800 ballpark chess amateur) and
myself (amateur programmer but expert on computer algorithms, rating 2486) have
worked together now for several years developing four distinct chess programs
(Rex, Alpha, Socrates, and Titan), each clearly better than its predecessor when
allowance is made for the fact that Alpha and Titan are written in "C" while Rex
and Socrates are in Assembly language, which is faster.  I thought it might be
interesting to readers to get an idea of how we go about improving our programs.
  First of all, to determine if a change is an improvement, we need some method
for judging this.  If the change is simply a small scoring change to correct
some poor move we have observed, it must be left to my judgement, as such
changes are apt to be worth only a fraction change of a rating point, which
could never be measured.  But let's assume we are talking about some fairly
major change to the program.  The first thing we do after such a change is to
run it on a problem set, such as the one published in this and the preceding
issue of CCR. This gives us some idea of the effects of the change, but we must
be very careful.  The problem is that some changes may speed up the solution of
most or even all problems, yet have a subtle degrading effect on the positional
play.  Or, it may slow down most of the problems a bit while subtly improving
the positional play.  So the next step is automated testing, of which there are
two types.
  The first method we call "self-testing," in which the program with the change
plays against the same program without it, all on one computer.  We use a fixed
set of 100 openings that stop after five moves per side.  Each version gets
white once and black once in each opening, so if time permits a two hundred game
match may be played.  Since the computer is fully dedicated to whichever side is
on move, no thinking on the opponent's time is possible.  Presumably this should
affect both sides equally, but this may not always be true.  If the nature of
the change is such as to affect the speed of the program, we must test it on an
equal time basis; if it does not affect speed, we often test on fixed depth
searches to minimize the luck factor.  Since we have five 486 computers between
us, if we set them at different levels we can play a thousand games to evaluate
a change without ever having to touch anything once the test has begun.
Ideally, it would be nice to test at tournament time controls, but since we
always have so many ideas to test we find it necessary to do most testing at
levels ranging from 5" to 30" per move.  The luck factor is so great that we
must normally run several hundred games to have any confidence in the result,
though some changes are dramatic enough to prove themselves quickly.  Some
people have argued that self-testing is not very reliable because the program
may not know how to punish its own positional errors, and because a change that
speeds up the program (at some price) may score better against its "brother"
than it should.  We have not generally found these factors to be much of a
problem, but we do have an alternative method of testing.
  We call this second method "auto-testing."  We cable two identical computers
together, and use a "referee" program that Don wrote to allow our program to
play a totally different one.  Since he must write a separate program for each
opponent we wish to test against, we only use a few opponents.  At this time we
can test against MChess, Zarkov, Fritz (1), and against our own earlier programs
(i.e. Titan vs. Socrates).  This has the advantage of allowing both programs to
think on opponent's time, and is not subject to the criticisms mentioned above
against self-testing.  The disadvantage is that to judge whether a change is
beneficial requires twice as many games since each version must be auto-tested
against the same opponent.  I don't know which way is really better overall; we
use both methods.
  Now let's talk about how we actually try to improve the program.  One way is
by "rule-base" changes.  Don created a chess programming language that allowed
me to write hundreds of rules that the program processes before beginning its
search.  Based on these rules it decides where it would like to try to place its
pieces, other things being equal.  For example, if there are a lot of pieces on
the board, the program is heavily penalized for allowing its king to be brought
out to the center of the board, but if it is an edngame, it is rewarded for
centralizing its king.  There are bonuses for centralizing knights, for putting
bishops on long diagonals, for occupying holes, for bringing the queen near to
the enemy king, for centralizing pawns (i.e. capture towards the center), and
countless other bonuses and penalties.  If I observe bad play that can be easily
corrected by this rulebase, I do so.  However, other parts of the evaluation,
such as pawn structure and mobility, must be calculated at the end of each
variation searched, and so can only be changed by Don since they are not part of
the "rule-base."  There are also many parameters of the search itself which I
can adjust and test at will.  For example, the number of selective plies may be
varied, the degree of selectivity on each selective ply may be set, checks in
the quiescence search may be turned on or off, certain short-cuts we take may be
turned on or off, and so on.  We expect to leave in enough of these options so
that those who purchase Socrates may have the fun of self-testing various
possibilities themselves.  Perhaps someone will prove that our default settings
are in fact not the best!  We always have the fear that some choice we make
based on relatively fast games will prove to be the wrong choice at 40/2, so if
anyone who gets Socrates wants to self-test any parameters at 2 or 3 minute
levels on a 486 we would like to hear the result of the 200 games!  Testing can
be stopped after any game and re-started when the machine is free at the game
number corresponding the number of games already played.
  We occasionally try radical changes to the program if we can see some
rationale.  Often this means some sort of extension for some certain class of
moves, such as certain recaptures or replies to certain threats.  Usually the
extension does not prove to be a benefit due to the general slowing down of the
program it entails, but a few extensions have proven their worth and are
retained.  We keep hoping to hit on the one quick change that will add a hundred
points to the program, but the reality is that our progress has mostly been
achieved in tiny increments of 2-5 points at a time, which are really too small
to be measured with any certainty.  But they do add up!"



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.