Computer Chess Club Archives


Search

Terms

Messages

Subject: TD Learning Preparation Concern

Author: Brian Richardson

Date: 15:33:17 10/24/04


I am again looking at adding TD learning to Tinker.
Thank you to those that have provided comments and suggestions thus far.

My concern is that the position that the final score is based upon
must be saved.  This is awkward to do, it seems.  So, I walk down
the PV as far as possible, and then look at that position.

Unfortunately, the PV does not normally extend far enough, due
to quiesce search, or other search "instabilities".

The good news is that I found several PV and search bugs,
and things have improved.

The bad news is that for TD learning, sometimes
the final score does not match the evaluation score
for the walked PV position.

Then I tried matching the final score with a qsearch score from
the walked PV position.  This almost always matches, but not _all_
of the time.
For example, for Tinker, running
8/8/7k/8/4p1K1/8/5P2/8 b - - Fine16 bm e3
nothing matches after 12 ply, but then things stabalize and match again
for awhile, and then there are more mismatches, and so on.

I have tried testing with and without any hashing, pawn hashing,
force stuffing the PV into the hash table after each iteration,
and some other basic things, but there just seem to be a few cases where it does
not match.

My question is for those that have already added TD learning to their programs,
was this a problem, or perhaps your engines have a "cleaner" PV?

I could just run with qsearch instead of eval, but of course that would add
quite a bit of time to the learning computation runs.

Thanks,
Brian



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.