Author: James Swafford
Date: 21:11:23 10/20/01
I'm rehashing some papers by Baxter, Tridgell and Weaver.
The one I'm looking at now is entitled "TDLeaf : Combining
Temporal Difference Learning with game tree search."
I get the basic idea, but I have some specific questions I'm
hoping some of you can help me with.
The big goal, as I understand it, is to minimize the error between
the evaluation's predicted outcome and the real outcome through use
of the differences in the predicted outcome from move to move.
td(t) = eval(pos(t+1),w) - eval(pos(t),w), where w is a
vector of evaluation parameter values.
First, it's obvious that the evaluation parameters must somehow
be 'vectorized', or placed in a vector. OK. Additionally,
eval(pos,w) must be a differentiable function of its parameters
w (w1...wk).
Exactly what does that mean? I've had some calculus, and I've
had some linear algebra, but that eludes me.
Next - at the end of the game, TDLeaf will update the parameter
vector. It does this, if I understand the paper correctly,
by the following:
w = w + lr * s * t,
where lr is a scalar for learning rate, s is the sum of
the vectors of partial derivatives of the evaluation at each
position with respect to its parameters (w1....wk), and I don't
want to get into t right now for fear of complicating my question.
How do I compute a vector of partial derivatives of the eval
at any position with respect to its parameter weights?
Forgive me if I'm being horribly unclear or I've botched the
algorithm; I'm trying to make sense of the thing.
--
James
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.