Author: Peter McKenzie
Date: 10:48:18 07/07/03
Go up one level in this thread
On July 07, 2003 at 07:40:46, Bas Hamstra wrote: >I tried TD learning, it is interesting. However I have never been able to let TD >outperform manual tuning. If you start playing with a version with all >parameters set to zero, it will quickly learn near-realistic values for most >parameters. In my case not for all parameters: for instance it insisted in >setting a positive value for doubled pawns (because open files?). An even bigger >problem for me was that it kept tuning some parameters up and up, till >ridiculous high values. I thought about this, and in my opinion this is a >cause-effect problem. Take mobility as an example, suppose program A loses a >piece because of some combination. Because of the piece it eventually loses the >game. Now TD starts analyzing, and it conludes program B won the game because of >mobility, because with one piece less program A obviously has less mobility. >However mobility is not the real cause, it is an *effect* of being a piece down. >Therefore this parameter will go crazy, everytime a piece is lost, it will tune >up mobility. Very interesting. Did you try turning learning off when the score is above a certain threshold? Should we really be tuning +5 so that it gets closer to +6 ? As a chess player, I don't learn very much when I'm already a rook up ... its just technique then. Of course, if I lose after being a rook up then I might learn something :-) > >I tried some other things than TD. I remember one very simple scheme worked very >well too: after the game, you simply ask what the winner did more than the >loser, simply sum up the values of the parameters for all rootpositions. Change >the value of the parameters accordingly. That sounds nice and simple. > > >Best regards, >Bas. > > >On July 07, 2003 at 01:14:43, Peter McKenzie wrote: > >>I'm interested in trying some automated evaluation tuning, is anyone else doing >>this at the moment? Interested in hearing about any successes or failures in >>this area. >> >>TD learning looks like the most obvious thing to start thinking about, the >>following paper is a good introduction: >> >>http://cs.anu.edu.au/~Lex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf >> >>Also, here is Dan Homan's pseudo code from a few years back: >> >>http://fortuna.iasi.rdsnet.ro/ccc/ccc.php?art_id=117970 >> >> >>I'm not 100% convinced by TD learning, but it certainly looks interesting. >> >>As I understand it TD learning basically uses the scores from the next few >>positions to give a (hopefully) better estimate of the score for the current >>position. It then adjusts the eval weights so that the eval (or in the case of >>TDLeaf, the eval of the position at the tip of the PV) moves towards the >>estimate. >> >>OK, technically it uses all the remaining positions in the game for its score >>estimate, but in practice this is heavily weighted towards the next few >>positions. It's a pretty cool idea really. >> >>One problem I see is that different features will be tuned at different rates. >>Common features will of course be tuned quite quickly while rare features that >>occur only occasionally will be tuned slowly. This is to some extent >>unavoidable but maybe it makes sense to slow the rate of change for weights of >>common features before doing the same with rare features. Possibly a minor >>point though. >> >>Peter
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.