Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Hello from Edmonton (and on Temporal Differences)

Author: Vincent Diepeveen

Date: 14:18:24 08/05/02

Go up one level in this thread


On August 05, 2002 at 16:58:02, Sune Fischer wrote:

>On August 05, 2002 at 16:24:17, Vincent Diepeveen wrote:
>>>
>>>This is a known problem.  If your eval doesn't have an important term in it,
>>>then trying to tune it over a large set of positions will lead to trouble,
>>>because you will try to cover up the "hole" by tweaking the wrong scores...
>
>It doesn't work that way, it's incremental tuning so you must play a game
>between each tuning, no running through a testset. The evaluator is given a
>'reward' at the end position (1 for win, 0 for draw, -1 for loss), you 'teach'
>what is good and what is bad by adjusting the i'th values based on the i+1'th
>value.
>It won't work if it loses all the time, that will be like telling it; "no no no,
>that is bad" constantly, some times you need to tell it what is good, there has
>to be a balance.
>
>>>Perhaps once you are convinced you are evaluating most of the really important
>>>things, this might work.  I'm a long way from that point myself...
>
>It sounds to me like a strange problem, I think I will have to see that for
>myself to believe it. I would expect the temporal difference to be 0 if there is
>a knowledge term missing, so i don't know what could be going on.
>
>>This is *exactly* why i say that lacking domain dependant knowledge means
>>that it's impossible to tune under O(n log n)
>>
>>If you lack knowledge then each parameter must be tried seperately somehow
>>simply, because there is no proof that a different combination of patterns
>>will do worse.
>
>It will adjust _all_ the weights that make a contribution in the eval.

I know, but that is exactly the problem.

*that* isn't working, because it doesn't know what it is adjusting,
so it doesn't draw the right conclusions at all. in fact an infinite
run of TD learning will only by random luck manage to find out
what a good parameter set it, that's exactly the problem here.

Drawing conclusions in chess is a problem anyway, because results of
a game are not always determined whether something is better or worse.

  THE REAL PROBLEM IN CHESS:

Suppose it happens that the tuner (TD neural network or whatever
as long as it's optmizing a bunch of parameters at the same time)
have by accident chosen a starting set where
*all* my parameters tuned very well except the open file
bonus. Instead of positive +0.50 it has put it to negative -0.50.

In nowadays chess that means a sure defeat.

So the learner will draw the wrong conclusion, because the
next run where it tunes another x parameters wrong, the randomness
of the position makes the defeat less sure.

It is a trivial fact in chess that if you play for random positions that
the chance you win or draw is bigger than with a good program with one
huge problem, because the randomness of a position is having a small
chance to confuse the opponent, because the loss by natural induction doesn't
apply.

to explain this: if 2 players try to achieve nearly 100% the same thing then
obviously if 1 thing is completely *dead* wrong, you lose chanceless.

If 2 programs are *completely different* from each other then this chance
is less.

It's here where it is clear that domain dependant knowledge is required.

No if you make an autotuner you are not going to 'guide' it each run of it.
you have a tuner out of LAZYNESS. Because you can do yourself a better job
anyway.


>If they do _not_ contribute they won't get adjusted (partial derivative will be
>0).
>So, lets say the difference between position i and postion i+1 is one knight
>move. TDLeaf will adjust the mobility for those pieces involved, it will adjust
>the knight eval terms, the two piece square terms and whatever else you have.
>Some of these probably shouldn't have been adjusted or are adjusted in the wrong
>direction, but then next time they should adjust right back. On average they
>should settle down on the right values if you decrease the learning rate at the
>proper speed.
>
>>I consider Tao's evaluation a lot better than that of crafty, perhaps not
>>so well tuned yet, but obviously it's not nice to say that because
>>his evaluation sucks, TD has problems. Instead i would say TD isn't working
>>simply.
>>
>>It works a bit better than random tuning, but that's about it!
>>
>>I am sure that if i see an entire evaluation of a decent program and can tune
>>it, with just running a few test positions i select myself, that after
>>tuning, a TD tuning program will NOT be capable of doing better
>
>Time changes, once there were people who didn't believe in nullmove, some who
>didn't believe in pruning, conventional wisdom is not always right, in fact it
>makes progress impossible.
>
>I'm don't think it is easy to do correctly, or that everything you need is right
>there in KnightCap, but with a few modifications here and there....
>
>>Now i'm most likely not even worlds best tuner, but the ones that are,
>>they probably laugh so loud about TD that they don't even post here, just
>>read it.
>
>I think it is more likely that they laugh at those still doing it manually :)
>
>-S.
>>Best regards,
>>Vincent



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.