Author: Dan Homan
Date: 05:41:09 07/02/00
Go up one level in this thread
I basically just followed the information in the papers by the knightcap research group. They have a very clear discription of TDLeaf learning. In general, all you need is a record of the terminal search position (i.e. the leaf positions from you pv) for each move your engine made during the game. You should also keep record of which moves of the opponent were correctly predicted during pondering. With this list of terminal search positions, you will use your scoring function to calculate incremental differences between scores as the game progressed, e.g. d[i] = score[i+1] - score[i]. The knightcap guys smooth the scoring function with a hyperbolic tangent (tanh) so d[i] = n[i+1] - n[1], where n[i] = tanh(0.255*score[i]) for score[i] given in units of 1 pawn. You also need to compute the derivatives with respect to each evaluation parameter at each position. This is trivial to do, just add some small amout to the parameter in question, recompute the score, n[i], subtract the unmodified n[i] and divide by the small amount you added to the parameter. Finally, you need to do a double sum over a combination of these computed numbers (score differences and derivatives). This double sum as well as the detailed formulas are all given in the ICCA papers by the knightcap group. I got postscript versions from the "Machine Learning in Games" website listed in the computer chess resource center on this site. So the modifications necessary to my chess program to get this to work were simply: 1) Record the terminal search positions during each search 2) Record whether I correctly prediced the opponent move during pondering (this is useful to avoid using a blunder by the opponent to learn parameters) 3) A function to compute all the necessary score differences and derivatives and sum them up. I also needed the following stuff to make everything easier 1) A list of evaluation parameters in a single array 2) Functions to read/write these parameters to a disk file 3) A function to set these parameters as the actually parameters used in my eval function (i.e. KNIGHT_VALUE = score_parameter[3];) I still am working out my implementation of TD learning. For example, I am only learning from losses right now. I also need to work out a automated way of scaling the size of the update made to a parameter. Another issue is that my evaluation function has a resolution of 1/100 of pawn and the knightcap guys suggest a higher resolution to get TD learning to work best. I will probably go to a higher resolution for scoring the positions, but will still likely work in 1/100 of a pawn in my search. If anyone else is interested in trying this, I'd be interested to share ideas about it as our implementations develop. - Dan
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.