Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Information on implementation

Author: Alessandro Damiani

Date: 08:15:37 07/02/00

Go up one level in this thread


On July 02, 2000 at 08:41:09, Dan Homan wrote:

>I basically just followed the information in the papers by the knightcap
>research group.  They have a very clear discription of TDLeaf learning.
>
>In general, all you need is a record of the terminal search position (i.e.
>the leaf positions from you pv) for each move your engine made during the
>game.  You should also keep record of which moves of the opponent were
>correctly predicted during pondering.
>
>With this list of terminal search positions, you will use your scoring
>function to calculate incremental differences between scores as the game
>progressed, e.g. d[i] = score[i+1] - score[i].  The knightcap guys
>smooth the scoring function with a hyperbolic tangent (tanh) so
>d[i] = n[i+1] - n[1], where n[i] = tanh(0.255*score[i]) for score[i]
>given in units of 1 pawn.
>
>You also need to compute the derivatives with respect to each evaluation
>parameter at each position.  This is trivial to do, just add some small
>amout to the parameter in question, recompute the score, n[i], subtract the
>unmodified n[i] and divide by the small amount you added to the parameter.
>
>Finally, you need to do a double sum over a combination of these computed
>numbers (score differences and derivatives).  This double sum as well as
>the detailed formulas are all given in the ICCA papers by the knightcap
>group.  I got postscript versions from the "Machine Learning in Games"
>website listed in the computer chess resource center on this site.
>
>So the modifications necessary to my chess program to get this to work
>were simply:
>
>1) Record the terminal search positions during each search
>2) Record whether I correctly prediced the opponent move during pondering
>(this is useful to avoid using a blunder by the opponent to learn parameters)
>3) A function to compute all the necessary score differences and derivatives
>and sum them up.
>
>I also needed the following stuff to make everything easier
>
>1) A list of evaluation parameters in a single array
>2) Functions to read/write these parameters to a disk file
>3) A function to set these parameters as the actually parameters used in
>my eval function (i.e.  KNIGHT_VALUE = score_parameter[3];)
>
>I still am working out my implementation of TD learning.  For example, I am
>only learning from losses right now.  I also need to work out a automated
>way of scaling the size of the update made to a parameter.  Another issue
>is that my evaluation function has a resolution of 1/100 of pawn and the
>knightcap guys suggest a higher resolution to get TD learning to work best.
>I will probably go to a higher resolution for scoring the positions, but
>will still likely work in 1/100 of a pawn in my search.
>
>If anyone else is interested in trying this, I'd be interested to share ideas
>about it as our implementations develop.
>
> - Dan

Hi Dan!

I am interested in TD-learning. I have got the papers since a long time, but I
don't have time to program. I think TD is worth to experiment with.

Alessandro



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.