Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Automated Evaluation Learning

Author: Bas Hamstra

Date: 04:40:46 07/07/03

Go up one level in this thread


I tried TD learning, it is interesting. However I have never been able to let TD
outperform manual tuning. If you start playing with a version with all
parameters set to zero, it will quickly learn near-realistic values for most
parameters. In my case not for all parameters: for instance it insisted in
setting a positive value for doubled pawns (because open files?). An even bigger
problem for me was that it kept tuning some parameters up and up, till
ridiculous high values. I thought about this, and in my opinion this is a
cause-effect problem. Take mobility as an example, suppose program A loses a
piece because of some combination. Because of the piece it eventually loses the
game. Now TD starts analyzing, and it conludes program B won the game because of
mobility, because with one piece less program A obviously has less mobility.
However mobility is not the real cause, it is an *effect* of being a piece down.
Therefore this parameter will go crazy, everytime a piece is lost, it will tune
up mobility.

I tried some other things than TD. I remember one very simple scheme worked very
well too: after the game, you simply ask what the winner did more than the
loser, simply sum up the values of the parameters for all rootpositions. Change
the value of the parameters accordingly.


Best regards,
Bas.


On July 07, 2003 at 01:14:43, Peter McKenzie wrote:

>I'm interested in trying some automated evaluation tuning, is anyone else doing
>this at the moment?  Interested in hearing about any successes or failures in
>this area.
>
>TD learning looks like the most obvious thing to start thinking about, the
>following paper is a good introduction:
>
>http://cs.anu.edu.au/~Lex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf
>
>Also, here is Dan Homan's pseudo code from a few years back:
>
>http://fortuna.iasi.rdsnet.ro/ccc/ccc.php?art_id=117970
>
>
>I'm not 100% convinced by TD learning, but it certainly looks interesting.
>
>As I understand it TD learning basically uses the scores from the next few
>positions to give a (hopefully) better estimate of the score for the current
>position.  It then adjusts the eval weights so that the eval (or in the case of
>TDLeaf, the eval of the position at the tip of the PV) moves towards the
>estimate.
>
>OK, technically it uses all the remaining positions in the game for its score
>estimate, but in practice this is heavily weighted towards the next few
>positions.  It's a pretty cool idea really.
>
>One problem I see is that different features will be tuned at different rates.
>Common features will of course be tuned quite quickly while rare features that
>occur only occasionally will be tuned slowly.  This is to some extent
>unavoidable but maybe it makes sense to slow the rate of change for weights of
>common features before doing the same with rare features.  Possibly a minor
>point though.
>
>Peter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.