Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hello from Edmonton (and on Temporal Differences)

Author: Sune Fischer

Date: 13:58:02 08/05/02

Go up one level in this thread


On August 05, 2002 at 16:24:17, Vincent Diepeveen wrote:
>>
>>This is a known problem.  If your eval doesn't have an important term in it,
>>then trying to tune it over a large set of positions will lead to trouble,
>>because you will try to cover up the "hole" by tweaking the wrong scores...

It doesn't work that way, it's incremental tuning so you must play a game
between each tuning, no running through a testset. The evaluator is given a
'reward' at the end position (1 for win, 0 for draw, -1 for loss), you 'teach'
what is good and what is bad by adjusting the i'th values based on the i+1'th
value.
It won't work if it loses all the time, that will be like telling it; "no no no,
that is bad" constantly, some times you need to tell it what is good, there has
to be a balance.

>>Perhaps once you are convinced you are evaluating most of the really important
>>things, this might work.  I'm a long way from that point myself...

It sounds to me like a strange problem, I think I will have to see that for
myself to believe it. I would expect the temporal difference to be 0 if there is
a knowledge term missing, so i don't know what could be going on.

>This is *exactly* why i say that lacking domain dependant knowledge means
>that it's impossible to tune under O(n log n)
>
>If you lack knowledge then each parameter must be tried seperately somehow
>simply, because there is no proof that a different combination of patterns
>will do worse.

It will adjust _all_ the weights that make a contribution in the eval.
If they do _not_ contribute they won't get adjusted (partial derivative will be
0).
So, lets say the difference between position i and postion i+1 is one knight
move. TDLeaf will adjust the mobility for those pieces involved, it will adjust
the knight eval terms, the two piece square terms and whatever else you have.
Some of these probably shouldn't have been adjusted or are adjusted in the wrong
direction, but then next time they should adjust right back. On average they
should settle down on the right values if you decrease the learning rate at the
proper speed.

>I consider Tao's evaluation a lot better than that of crafty, perhaps not
>so well tuned yet, but obviously it's not nice to say that because
>his evaluation sucks, TD has problems. Instead i would say TD isn't working
>simply.
>
>It works a bit better than random tuning, but that's about it!
>
>I am sure that if i see an entire evaluation of a decent program and can tune
>it, with just running a few test positions i select myself, that after
>tuning, a TD tuning program will NOT be capable of doing better

Time changes, once there were people who didn't believe in nullmove, some who
didn't believe in pruning, conventional wisdom is not always right, in fact it
makes progress impossible.

I'm don't think it is easy to do correctly, or that everything you need is right
there in KnightCap, but with a few modifications here and there....

>Now i'm most likely not even worlds best tuner, but the ones that are,
>they probably laugh so loud about TD that they don't even post here, just
>read it.

I think it is more likely that they laugh at those still doing it manually :)

-S.
>Best regards,
>Vincent



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.