Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Automated Evaluation Learning

Author: Peter McKenzie

Date: 10:48:18 07/07/03

Go up one level in this thread


On July 07, 2003 at 07:40:46, Bas Hamstra wrote:

>I tried TD learning, it is interesting. However I have never been able to let TD
>outperform manual tuning. If you start playing with a version with all
>parameters set to zero, it will quickly learn near-realistic values for most
>parameters. In my case not for all parameters: for instance it insisted in
>setting a positive value for doubled pawns (because open files?). An even bigger
>problem for me was that it kept tuning some parameters up and up, till
>ridiculous high values. I thought about this, and in my opinion this is a
>cause-effect problem. Take mobility as an example, suppose program A loses a
>piece because of some combination. Because of the piece it eventually loses the
>game. Now TD starts analyzing, and it conludes program B won the game because of
>mobility, because with one piece less program A obviously has less mobility.
>However mobility is not the real cause, it is an *effect* of being a piece down.
>Therefore this parameter will go crazy, everytime a piece is lost, it will tune
>up mobility.


Very interesting.  Did you try turning learning off when the score is above a
certain threshold?  Should we really be tuning +5 so that it gets closer to +6 ?
 As a chess player, I don't learn very much when I'm already a rook up ... its
just technique then.  Of course, if I lose after being a rook up then I might
learn something :-)

>
>I tried some other things than TD. I remember one very simple scheme worked very
>well too: after the game, you simply ask what the winner did more than the
>loser, simply sum up the values of the parameters for all rootpositions. Change
>the value of the parameters accordingly.

That sounds nice and simple.

>
>
>Best regards,
>Bas.
>
>
>On July 07, 2003 at 01:14:43, Peter McKenzie wrote:
>
>>I'm interested in trying some automated evaluation tuning, is anyone else doing
>>this at the moment?  Interested in hearing about any successes or failures in
>>this area.
>>
>>TD learning looks like the most obvious thing to start thinking about, the
>>following paper is a good introduction:
>>
>>http://cs.anu.edu.au/~Lex.Weaver/pub_sem/publications/ICCA-98_equiv.pdf
>>
>>Also, here is Dan Homan's pseudo code from a few years back:
>>
>>http://fortuna.iasi.rdsnet.ro/ccc/ccc.php?art_id=117970
>>
>>
>>I'm not 100% convinced by TD learning, but it certainly looks interesting.
>>
>>As I understand it TD learning basically uses the scores from the next few
>>positions to give a (hopefully) better estimate of the score for the current
>>position.  It then adjusts the eval weights so that the eval (or in the case of
>>TDLeaf, the eval of the position at the tip of the PV) moves towards the
>>estimate.
>>
>>OK, technically it uses all the remaining positions in the game for its score
>>estimate, but in practice this is heavily weighted towards the next few
>>positions.  It's a pretty cool idea really.
>>
>>One problem I see is that different features will be tuned at different rates.
>>Common features will of course be tuned quite quickly while rare features that
>>occur only occasionally will be tuned slowly.  This is to some extent
>>unavoidable but maybe it makes sense to slow the rate of change for weights of
>>common features before doing the same with rare features.  Possibly a minor
>>point though.
>>
>>Peter



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.