Author: Dan Homan

Date: 07:25:55 07/06/00

I've got some further e-mail requests about my TD learning code in EXchess, and I thought it might be useful to write up some pseudo-code. This is basically a direct translation of the formal presentation of formulas in the knightcap papers, so there is nothing new here. In fact, you should definitely read those papers before trying to implement this, because there will be special cases and considerations for your particular implementation. Also, I don't claim to have got it right... so far it seems to work for the piece values. The pseudo-code corresponds pretty closely (minus special cases and some debugging stuff) to my implementation in EXchess. The code assumes you have the following things eval(position p) - evaluation function, returns in centipawns set_eval_param() - function to set the eval parameters (e.g. KNIGHT_VALUE = score_param[3]; etc...) leaf_pos[] - array of final leaf positions from searches during the game score_param[] - array of scoring parameters, stored in a file write_param() - function to write parameters to a file void learn_parameters(int game_result) { // game_result is -1,0,or 1 float s[256]; // array of position scores float d[256]; // array of position score differences float ds; // scoring derivative float sum1, sum2; float alpha = 1.0, Lambda = 0.7; // alpha controls size of the parameter // updates and Lambda controls how much // the later score differences in a game // influence the contribution of a // particular position's score derivative /* loop to setup position scores and score differences */ for(int i = 1; i < N; i++) { s[i] = tanh(0.00255*eval(leaf_pos[i])); if(i > 1) { d[i-1] = s[i] - s[i-1]; if(i==N-1) d[i] = float(game_result) - s[i]; /* don't use opponent blunders to learn scores */ if(d[i] > 0 && !predicted_opponent_move[i]) d[i] = 0; } } /* loop over eval parameters */ for(int j = 1; j <= TOTAL_PARAMETERS; j++) { sum1 = 0.0; /* loop over game positions */ for(int i = 1; i < N; i++) { // compute the scoring derivative by adding 1/100 of pawn score_param[j] += 1; set_eval_param(); ds = (tanh(0.00255*eval(leaf_pos[i])) - s[i])/0.01; score_param[j] -= 1; // now sum over all score differences to end of game, // weighting down the later positions by Lambda^(m-i) sum2 = 0.0; for(int m = i; m < N; m++) { sum2 += pow(Lambda,(m-i))*d[m]; } // add in the contribution of this position to the parameter update sum1 += ds*sum2; } // update the scoring parameter; score_param[j] += alpha*sum1; } // write the updated parameters out to a file write_param(); }

- Re: Pseudo-code for TD learning
**Frank Schneider***21:17:31 07/06/00* - Re: Pseudo-code for TD learning
**Andrew Williams***10:56:23 07/06/00* - Re: Pseudo-code for TD learning
**Tom Kerrigan***10:20:15 07/06/00*- Re: Pseudo-code for TD learning
**Dan Homan***10:30:05 07/06/00*

- Re: Pseudo-code for TD learning
- Re: Pseudo-code for TD learning
**KarinsDad***08:00:46 07/06/00*- Re: Pseudo-code for TD learning
**Djordje Vidanovic***05:47:45 07/07/00* - Re: Pseudo-code for TD learning
**Gareth McCaughan***18:20:24 07/06/00*- Re: Pseudo-code for TD learning
**Gareth McCaughan***13:11:09 07/07/00*

- Re: Pseudo-code for TD learning
- Re: Pseudo-code for TD learning
**Dan Homan***08:04:46 07/06/00*

- Re: Pseudo-code for TD learning

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.