Author: jonathan Baxter
Date: 14:23:01 02/02/99
Go up one level in this thread
On February 01, 1999 at 12:27:00, Will Singleton wrote: >This week TDChess pushed past the magic 2600 mark. Those who have been watching >will have noted that TDChess improved in each of the last four weeks (while >playing an average of about 450 games per week), starting at 2338 five weeks >ago. Don't know whether this is due to program changes or the effects of TD >lambda, perhaps Jon could enlighten us. Well, the rating increase is due to 3 factors: 1: 4 weeks ago I realised I had left a compiler switch on that meant TDChess was playing with a quiescence search that only looks at promotions. That made a big difference. 2: Over Christmas, TDChess was playing with some really dumb and simple parameters in its eval, but again about 4 weeks ago I turned on the TD learning for a week's play and it improved dramatically and altered the parameters a lot. 3: TD learning slows the program down by a factor of 2 because you have to pass big structures around in the code. So I turned that off after a week (about 3 weeks ago) and just left the *opening book* learning on. TDChess has quite a complicated way of learning its opening book, and note that it has absolutely zero stored games so all the openings it plays it has learnt from scratch. Here is an excerpt from our ICCA Journal paper on KnightCap (which is the code TDChess is based on): \section{Opening Learning} \label{book} KnightCap's lack of an opening book hampered its performance once its rating had improved to the level where it started playing sufficiently good opponents. The evaluation function is not sophisticated enough to prevent KnightCap from repeatedly playing opening blunders. Rather than supply KnightCap with an opening book of Grandmaster games, we tried implementing various opening book learning methods based on the ``permanent brain'' idea in Crafty. Crafty's permanent brain is a set of losing positions and their evaluations that are inserted into the hash table at the start of every search. The search will then avoid playing again into those bad lines. In our case we first tried inserting {\em all} positions that occurred during games and their backed up values into the brain (similar to \cite{scherzer90}). A consistency check was also performed to locate later entries that invalidated positions 1 or 2 ply earlier, and if so the earlier positions were researched (with the later positions and their values inserted into the hash table). Although this method did improve performance somewhat, there were two difficulties: \begin{itemize} \item Some entries inserted into the brain had negative backed-up values but were in fact perfectly playable positions. In such cases KnightCap would avoid the positions in future and never discover that they are actually acceptable. \item It is nearly impossible to keep the brain consistent without spending large amounts of time researching the positions in the brain between games. \end{itemize} To circumvent these problems we implemented an alternative scheme. Instead of inserting all positions in a game in the brain, KnightCap only inserts the {\em losing} position and the two subsequent positions. KnightCap inserts no positions if it won the game or drew against a stronger opponent. The losing position is defined to be last position in the game for which the evaluation was above zero, and the evaluation of the previous two moves were also above zero. By requiring at least three consecutive moves to be above zero we damp out some of the noise in the evaluation function. The three new positions are then researched in reverse order to at least depth 7 or depth one more than the depth searched in the game, whichever is greater (this takes around 30 seconds for a middle-game position on a Pentium Pro 200MHz PC (in blitz play)). The reason for storing the two positions after the losing position is to guarantee that the new search on the losing position is aware of the low score of the later positions. One modification to this procedure is made in the case that the losing position occurs prior to a brain position that was actually encountered in the game. In that case we regard the final brain position as the losing move, insert its two successor moves and research all 3 in reverse order. Every time a position is inserted in the brain, a 2-ply consistency check is performed: every brain position is examined to see if its recommended move leads to another brain position. If so, and if the second brain position contradicts the score of the first brain position both positions are marked to be researched. If the second position is not in the brain but there is a move in that position that leads to a brain position with a lower score than the first brain position then again the second and first brain positions are marked for research. All entries requiring research after the consistency check are searched in reverse order. A new game is not played until the brain is completely ``clean'' (no inconsistencies are found). With this opening learning strategy\footnote{This is not strictly an {\em opening} learning strategy as brain positions can come from any phase of the game.} KnightCap (playing under the pseudonym ``KnightC'' on ICC) has learnt many of the standard opening lines (KnightCap now has more than 10000 positions in its brain). Not being experts ourselves, we judge this by the fact that when playing versions of Crafty it is nearly always in Crafty's book for five full moves and has been observed to be in book for 17 moves on one occasion. One nice thing about the opening learning is that KnightCap ``understands'' why certain opening moves are weak because it has played the weak lines itself in determining this fact. Such information is not available in Grandmaster games: the ``correct moves'' are always there but the computer does not necessarily know what to do if its opponent plays an incorrect move. Sometimes, however, KnightCap will lose a large number of games before eventually ruling out a particular weak line of play. Note that we have not run book learning and evaluation function learning simultaneously. The difficulty with applying both simultaneuosly is that as learning proceeds older book entries need to be periodically researched because they were originally searched with an obsolete evaluation function. This rquires even more time to be devoted to the management of the brain. An example of KnightCap's play with a large brain learnt through on-line play on ICS is shown in Figure \ref{crafty}.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.