Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: u2600 Rating List -- Feb 1

Author: jonathan Baxter
Date: 14:23:01 02/02/99
On February 01, 1999 at 12:27:00, Will Singleton wrote:

>This week TDChess pushed past the magic 2600 mark.  Those who have been watching
>will have noted that TDChess improved in each of the last four weeks (while
>playing an average of about 450 games per week), starting at 2338 five weeks
>ago. Don't know whether this is due to program changes or the effects of TD
>lambda, perhaps Jon could enlighten us.

Well, the rating increase is due to 3 factors:

1:
4 weeks ago I realised I had left a compiler switch on that meant TDChess was
playing with a quiescence search that only looks at promotions. That made a big
difference.

2:
Over Christmas, TDChess was playing with some really dumb and simple parameters
in its eval, but again about 4 weeks ago I turned on the TD learning for a
week's play and it improved dramatically and altered the parameters a lot.

3: TD learning slows the program down by a factor of 2 because you have to pass
big structures around in the code. So I turned that off after a week (about 3
weeks ago) and just left the *opening book* learning on. TDChess has quite a
complicated way of learning its opening book, and note that it has absolutely
zero stored games so all the openings it plays it has learnt from scratch.

Here is an excerpt from our ICCA Journal paper on KnightCap (which is the code
TDChess is based on):

\section{Opening Learning}
\label{book}
KnightCap's lack of an opening book hampered its performance once its
rating had improved to the level where it started playing sufficiently
good opponents. The evaluation function is not sophisticated enough to
prevent KnightCap from repeatedly playing opening blunders.
Rather than supply KnightCap with an opening book of
Grandmaster games, we tried implementing various opening book learning
methods based on the ``permanent brain'' idea in Crafty. Crafty's
permanent brain is a set of losing positions and their
evaluations that are inserted into the hash table at the start of
every search. The search will then avoid playing again into those bad
lines.

In our case we first tried inserting {\em all} positions that occurred
during games and their backed up values into the brain (similar to
\cite{scherzer90}). A consistency check was also performed to locate
later entries that invalidated positions 1 or 2 ply earlier, and if so
the earlier positions were researched (with the later positions and their
values inserted into the hash table). Although this method did improve
performance somewhat, there were two difficulties:

\begin{itemize}
\item Some entries inserted into the brain had negative backed-up
values but were in fact perfectly playable positions. In such cases
KnightCap would avoid the positions in future and never discover that
they are actually acceptable.
\item It is nearly impossible to keep the brain consistent without
spending large amounts of time researching the positions in the brain
between games.
\end{itemize}
To circumvent these problems we implemented an alternative
scheme. Instead of inserting all positions in a game in the brain, KnightCap
only inserts the {\em losing} position and the two subsequent
positions. KnightCap inserts no positions if it won the game or drew
against a stronger opponent. The losing position is defined to be last
position in the game for which the evaluation was above zero, and the
evaluation of the previous two moves were also above zero. By
requiring at least three consecutive moves to be above zero we damp
out some of the noise in the evaluation function. The three new
positions are then researched in reverse order to at least depth 7 or
depth one more than the depth searched in the game, whichever is
greater (this takes around 30 seconds for a middle-game position on a
Pentium Pro 200MHz PC (in blitz play)). The reason for storing the two


positions after the losing position is to guarantee that the new search on
the losing position is aware of the low score of the later positions.

One modification to this procedure is made in the case that the losing
position occurs prior to a brain position that was actually
encountered in the game. In that case we regard the final brain
position as the losing move, insert its two successor moves and
research all 3 in reverse order.

Every time a position is inserted in the brain, a 2-ply consistency
check is performed: every brain position is examined to see if its
recommended move leads to another brain position. If so, and if the
second brain position contradicts the score of the first brain
position both positions are marked to be researched. If the second
position is not in the brain but there is a move in that position that
leads to a brain position with a lower score than the first brain
position then again the second and first brain positions are marked
for research. All entries requiring research after the consistency
check are searched in reverse order. A new game is not played until
the brain is completely ``clean'' (no inconsistencies are found).

With this opening learning strategy\footnote{This is not strictly an
{\em opening} learning strategy as brain positions can come from any
phase of the game.} KnightCap (playing under the pseudonym ``KnightC''
on ICC) has learnt many of the standard opening lines (KnightCap now
has more than 10000 positions in its brain). Not being
experts ourselves, we judge this by the fact that when playing
versions of Crafty it is nearly always in Crafty's book for five full
moves and has been observed to be in book for 17 moves on one
occasion. One nice thing about the opening learning is that KnightCap
``understands'' why certain opening moves are weak because it has
played the weak lines itself in determining this fact. Such
information is not available in Grandmaster games: the ``correct
moves'' are always there but the computer does not necessarily know
what to do if its opponent plays an incorrect move. Sometimes,
however, KnightCap will lose a large number of games before eventually
ruling out a particular weak line of play.

Note that we have not run book learning and evaluation function
learning simultaneously. The difficulty with applying both
simultaneuosly is that as learning proceeds older book entries need to
be periodically researched because they were originally searched with
an obsolete evaluation function. This rquires even more time to be
devoted to the management of the brain. An example of KnightCap's play
with a large brain learnt through on-line play on ICS is shown in
Figure \ref{crafty}.
TDchess discussion Will Singleton 15:40:01 02/02/99
- Re: TDchess discussion jonathan Baxter 14:05:33 02/03/99
  - Re: TDchess discussion David Blackman 23:28:35 02/04/99
    - Re: TDchess discussion Jay Scott 09:59:47 02/06/99
    - Re: TDchess discussion Will Singleton 11:41:39 02/05/99
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.