Computer Chess Club Archives


Search

Terms

Messages

Subject: Information on implementation

Author: Dan Homan

Date: 05:41:09 07/02/00

Go up one level in this thread


I basically just followed the information in the papers by the knightcap
research group.  They have a very clear discription of TDLeaf learning.

In general, all you need is a record of the terminal search position (i.e.
the leaf positions from you pv) for each move your engine made during the
game.  You should also keep record of which moves of the opponent were
correctly predicted during pondering.

With this list of terminal search positions, you will use your scoring
function to calculate incremental differences between scores as the game
progressed, e.g. d[i] = score[i+1] - score[i].  The knightcap guys
smooth the scoring function with a hyperbolic tangent (tanh) so
d[i] = n[i+1] - n[1], where n[i] = tanh(0.255*score[i]) for score[i]
given in units of 1 pawn.

You also need to compute the derivatives with respect to each evaluation
parameter at each position.  This is trivial to do, just add some small
amout to the parameter in question, recompute the score, n[i], subtract the
unmodified n[i] and divide by the small amount you added to the parameter.

Finally, you need to do a double sum over a combination of these computed
numbers (score differences and derivatives).  This double sum as well as
the detailed formulas are all given in the ICCA papers by the knightcap
group.  I got postscript versions from the "Machine Learning in Games"
website listed in the computer chess resource center on this site.

So the modifications necessary to my chess program to get this to work
were simply:

1) Record the terminal search positions during each search
2) Record whether I correctly prediced the opponent move during pondering
(this is useful to avoid using a blunder by the opponent to learn parameters)
3) A function to compute all the necessary score differences and derivatives
and sum them up.

I also needed the following stuff to make everything easier

1) A list of evaluation parameters in a single array
2) Functions to read/write these parameters to a disk file
3) A function to set these parameters as the actually parameters used in
my eval function (i.e.  KNIGHT_VALUE = score_parameter[3];)

I still am working out my implementation of TD learning.  For example, I am
only learning from losses right now.  I also need to work out a automated
way of scaling the size of the update made to a parameter.  Another issue
is that my evaluation function has a resolution of 1/100 of pawn and the
knightcap guys suggest a higher resolution to get TD learning to work best.
I will probably go to a higher resolution for scoring the positions, but
will still likely work in 1/100 of a pawn in my search.

If anyone else is interested in trying this, I'd be interested to share ideas
about it as our implementations develop.

 - Dan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.