Author: Don Beal
Date: 13:47:50 10/25/04
Go up one level in this thread
On October 24, 2004 at 18:33:17, Brian Richardson wrote: >The bad news is that for TD learning, sometimes >the final score does not match the evaluation score >for the walked PV position. > >Then I tried matching the final score with a qsearch score from >the walked PV position. This almost always matches, but not _all_ >of the time. >For example, for Tinker, running >8/8/7k/8/4p1K1/8/5P2/8 b - - Fine16 bm e3 >nothing matches after 12 ply, but then things stabalize and match again >for awhile, and then there are more mismatches, and so on. > >I have tried testing with and without any hashing, pawn hashing, >force stuffing the PV into the hash table after each iteration, >and some other basic things, but there just seem to be a few cases where it does >not match. > >My question is for those that have already added TD learning to their programs, >was this a problem, or perhaps your engines have a "cleaner" PV? > >I could just run with qsearch instead of eval, but of course that would add >quite a bit of time to the learning computation runs. As you say, for TD learning, one wants to compute weight adjustments based on the position that gave rise to the score. (I call this position the one "selected" by the search. It is the position at the end of the complete PV.) In general, the scores, even from deep searches, will be wrong some of the time, and hence many of the weight adjustments will be bad. The learning process still works if "good" adjustments outweigh "bad" adjustments, which they normally do. So a few extra bad adjustments from non-selected positions can perhaps be tolerated. It is possible to write code that backs up a move sequence together with the score, or backs up the selected position itself together with the score. This ensures that the position that gets adjusted is the one that gave rise to the score. I do this, as I prefer to avoid the concern you raised. You say it is awkward, and perhaps the handling of move sequences can be tricky to get right, but backing up positions is straightforward, once you accept the copying cost (about 34 bytes, for a simple representation). Backing up a move sequence or position will slow down the learning runs somewhat, but unless you are using very sophisticated scoring terms that are only valuable for deep searches, you should be able to learn with shallower searches than for regular play, so the slow down should be tolerable. Your search for learning should use the qsearch. qsearch should also determine and back up a selected position, so that the depth-0 nodes always have a selected position to back up into the main search.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.