Computer Chess Club Archives


Search

Terms

Messages

Subject: AI considerations

Author: Jay Scott

Date: 14:15:31 06/28/04

Go up one level in this thread


On June 28, 2004 at 15:19:16, Anthony Cozzie wrote:
>Genetic algorithms and comparison training work completely differently.
>
>Genetic algorithms: Make a random change.  Play X games.  If the modified
>version wins, keep the change.  Otherwise don't.  The problem with this is the
>huge number of games.  A small eval change might require 500 games to get right;
>even at 1 minute per game that is still 20+ hours.
>
>Comparison Training: Get a bunch of GM games.  Have the computer search each
>position, and find the "dominant position" (the position at the end of the PV).
>Compute the eval of the position after the best move vs the position after some
>other move.  If the best move has a better score, then all is good.  Otherwise
>penalize.  Then I use the simulated annealing algorithm to decide whether to
>keep the change.

You're conflating two independent things, the optimization algorithm and the
objective function that it is optimizing. A genetic algorithm can optimize any
function, and there's no reason it can't be used with comparison training.
Simulated annealing can optimize any function too. In the last sentence you hint
at what you're really doing, but you don't explain it fully. It sounds like
you're maybe hillclimbing with each step decided by a full simulated annealing
run?

As you say, training against the game result only is extremely slow. That's no
fun.

I believe that training against GM games is not likely to work well. I'll be
glad if you can prove me wrong, but here is my reason:

* Computers and humans think too differently. Grandmasters know so much more
about the game that they make decisions based on considerations that search and
evaluation together do not understand. And conversely, the search looks at many
things that grandmasters do not care about. So tuning to play moves that are
statistically similar to good moves is not the same thing as tuning to play good
moves.

By training on each move rather than on each game, you're getting a lot more
training data and can learn faster. This principle can and should be extended
much, much further. For example, traditional temporal difference learning (which
is not a learning algorithm, it is an objective function that many learning
algorithms can use) learns from each move made in the game. KnightCap extended
the idea to learn from each move predicted in the principal variation, giving
better results. In principle, it should be possible to learn at least from every
node in the search tree which can be assigned an exact score rather than a
bound--and I am sure that there are ways to exploit the bounds too, though it's
trickier. The search tree contains a huge mass of information about the
program's thinking, and the more of it that can be extracted for learning, the
better.

  Jay



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.