# Computer Chess Club Archives

## Messages

### Subject: Pseudo-code for TD learning

Author: Dan Homan

Date: 07:25:55 07/06/00

```I've got some further e-mail requests about my TD learning code in EXchess,
and I thought it might be useful to write up some pseudo-code.

This is basically a direct translation of the formal presentation of
formulas in the knightcap papers, so there is nothing new here.  In fact,
you should definitely read those papers before trying to implement this,
because there will be special cases and considerations for your
particular implementation.  Also, I don't claim to have got it right... so
far it seems to work for the piece values. The pseudo-code corresponds pretty
closely (minus special cases and some debugging stuff) to my implementation in
EXchess.

The code assumes you have the following things

eval(position p)  - evaluation function, returns in centipawns
set_eval_param()  - function to set the eval parameters (e.g. KNIGHT_VALUE =
score_param; etc...)
leaf_pos[]  - array of final leaf positions from searches during the game
score_param[]  - array of scoring parameters, stored in a file
write_param()  - function to write parameters to a file

void learn_parameters(int game_result) {   // game_result is -1,0,or 1

float s;  // array of position scores
float d;  // array of position score differences
float ds;      // scoring derivative
float sum1, sum2;
float alpha = 1.0, Lambda = 0.7;  // alpha controls size of the parameter
// updates and Lambda controls how much
// the later score differences in a game
// influence the contribution of a
// particular position's score derivative

/* loop to setup position scores and score differences */
for(int i = 1; i < N; i++) {
s[i] = tanh(0.00255*eval(leaf_pos[i]));
if(i > 1) {
d[i-1] = s[i] - s[i-1];
if(i==N-1) d[i] = float(game_result) - s[i];
/* don't use opponent blunders to learn scores */
if(d[i] > 0 && !predicted_opponent_move[i]) d[i] = 0;
}
}

/* loop over eval parameters */
for(int j = 1; j <= TOTAL_PARAMETERS; j++) {
sum1 = 0.0;
/* loop over game positions */
for(int i = 1; i < N; i++) {
// compute the scoring derivative by adding 1/100 of pawn
score_param[j] += 1;
set_eval_param();
ds = (tanh(0.00255*eval(leaf_pos[i])) - s[i])/0.01;
score_param[j] -= 1;
// now sum over all score differences to end of game,
//  weighting down the later positions by Lambda^(m-i)
sum2 = 0.0;
for(int m = i; m < N; m++) {
sum2 += pow(Lambda,(m-i))*d[m];
}
// add in the contribution of this position to the parameter update
sum1 += ds*sum2;
}
// update the scoring parameter;
score_param[j] += alpha*sum1;
}

// write the updated parameters out to a file
write_param();

}

```