Computer Chess Club Archives




Subject: Re: Parameter Tuning

Author: jonathan Baxter

Date: 16:00:17 10/05/98

Go up one level in this thread

On October 05, 1998 at 13:53:31, Don Beal wrote:

>The learning rate is critical - it has to be as large as one dares
>for fast learning, but low for stable values.  We've been experimenting
>with methods for automatically adjusting the learning rate. (Higher
>rates if the adjustments go in the same direction, lower if they keep
>changing direction.)

This is similar to the use of "momentum" in training neural networks. Its an
example of a second order method, like newton's, conjugate gradient,
levenberg-marquet, etc. There is a monstrous literature on this for neural
nets. I always though it wouldn't make a great deal of difference to go to
second order.

>The other problem is learning weights for terms which only occur rarely.
>Then the learning process doesn't see enough examples to settle on
>good weights in a reasonable time.  I suspect this is the main limitation
>of the method, but it may be possible to devise ways to generate
>extra games which exercise the rare conditions.

This is called the "exploration/explotation " tradeoff in Reinforcement
Learning. Its a tough question. The same problem arose in some experiments I
ran over the weekend: KnightC was playing with PST's and 10 stages (3x3 for the
castling options for each side and then an ending stage). The
Q-side-castle/Q-side-castle stage was never seen in a few hundred games so those
PST's stayed with zero values. But with on-line play you avoid the worst of this
problem because your opponents tend to guide you to the relevant positions. With
self-play it is a real headache I think.


Jonathan Baxter

This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.