Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Temporal Difference

Author: Bas Hamstra

Date: 09:11:32 01/06/01

Go up one level in this thread


Hi James!

On January 06, 2001 at 11:48:44, James Swafford wrote:

>On January 05, 2001 at 17:36:55, Bas Hamstra wrote:
>
>You're a little ahead of me.  I've been wanting to do some work with
>TD for some time, but I've still got a couple months of work to do
>before I even get started.  I do have a couple questions for you,
>though:
>
>1.  Did you start with realistic weights, or did you begin with
>random values, or ???

I started with all weights zero. To see it did something reasonable, I let it
play a pool of 100 slightly randomized hand-tuned evals. You don't need 100
programs, just 100 weight sets and one program. It then learns to score 50% in
200 or 300 games. I plotted the weights real time in a graph to see how they
developed. This was my first test. Second step was to put it in my console based
"real" program and it is playing Crafty right now.

>2.  What do you mean by "wrong trend?"  I suppose you mean a term
>is "drifting" the wrong way... becoming more negative when it should
>be going more positive?

Yep. Say 90% of the weights tend to show reasonable values. But a few don't at
all. It might be that it needs more games, though. I am not sure if this is the
fastest way of automatic parameter tuning. Maybe some kind of "weight fitting"
on a large set of positions is more efficient, but how?

>3.  How are you training your evaluator?  With a wide variety of
>opponents, or by playing the same programs over and over, or ???
>How many games have you played?

Right now I am playing Crafty for a couple of hundreds of 1 0 games.

>4.  Does your engine compete on ICC?

A couple of times. But mostly FICS. The TD version has not played there yet. By
the way: I compared what it learns from a) wins b) losses c) draws. In my
opinion a) and c) did not do well. So now I only learn from losses. Never change
a winning team.

>James

Let me know your experiences, I am interested!

Ciao!
Bas.

>
>
>>I would like to share experience with some that have tried Temporal Difference
>>learning. Currently one of the problems I see is that for example BISHOPMOBILITY
>>has very large partial derivatives. So the updates for this term swing wildly
>>and distort learning. On the other hand, if I reduce the learning factor to
>>bring this in proportion, a term like DOUBLEDPAWN won't ever get to a realistic
>>value.
>>
>>One way to do better is to work with derivatives -1 or +1 only, depending on if
>>the partial derivative for a term is above or below zero. This results in a
>>tendency to realistic values for most terms.
>>
>>Still, a few terms refuse to show a "trend" at all, or even the wrong trend. Are
>>others having this problems?
>>
>>(To see what is going on I showed the developments of the weights in a graph, to
>>verify it does something useful, best results so far with -1/+1 only, bad
>>results when using the real derivatives)
>>
>>
>>Regards,
>>Bas.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.