Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: TD Learning Preparation Concern

Author: Don Beal
Date: 13:47:50 10/25/04
On October 24, 2004 at 18:33:17, Brian Richardson wrote:

>The bad news is that for TD learning, sometimes
>the final score does not match the evaluation score
>for the walked PV position.
>
>Then I tried matching the final score with a qsearch score from
>the walked PV position.  This almost always matches, but not _all_
>of the time.
>For example, for Tinker, running
>8/8/7k/8/4p1K1/8/5P2/8 b - - Fine16 bm e3
>nothing matches after 12 ply, but then things stabalize and match again
>for awhile, and then there are more mismatches, and so on.
>
>I have tried testing with and without any hashing, pawn hashing,
>force stuffing the PV into the hash table after each iteration,
>and some other basic things, but there just seem to be a few cases where it does
>not match.
>
>My question is for those that have already added TD learning to their programs,
>was this a problem, or perhaps your engines have a "cleaner" PV?
>
>I could just run with qsearch instead of eval, but of course that would add
>quite a bit of time to the learning computation runs.

As you say, for TD learning, one wants to compute weight adjustments
based on the position that gave rise to the score.  (I call this
position the one "selected" by the search. It is the position at the
end of the complete PV.)
In general, the scores, even from deep searches, will be wrong some
of the time, and hence many of the weight adjustments will be bad.
The learning process still works if "good" adjustments outweigh
"bad" adjustments, which they normally do.  So a few extra bad
adjustments from non-selected positions can perhaps be tolerated.

It is possible to write code that backs up a move sequence together
with the score, or backs up the selected position itself together
with the score.  This ensures that the position that gets adjusted is
the one that gave rise to the score.  I do this, as I prefer to avoid
the concern you raised.  You say it is awkward, and perhaps the
handling of move sequences can be tricky to get right, but backing
up positions is straightforward, once you accept the copying
cost (about 34 bytes, for a simple representation).

Backing up a move sequence or position will slow down the learning
runs somewhat, but unless you are using very sophisticated scoring
terms that are only valuable for deep searches, you should be able
to learn with shallower searches than for regular play, so the slow
down should be tolerable.

Your search for learning should use the qsearch.  qsearch should
also determine and back up a selected position, so that the depth-0 nodes
always have a selected position to back up into the main search.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.