Author: Jay Scott
Date: 14:15:31 06/28/04
Go up one level in this thread
On June 28, 2004 at 15:19:16, Anthony Cozzie wrote: >Genetic algorithms and comparison training work completely differently. > >Genetic algorithms: Make a random change. Play X games. If the modified >version wins, keep the change. Otherwise don't. The problem with this is the >huge number of games. A small eval change might require 500 games to get right; >even at 1 minute per game that is still 20+ hours. > >Comparison Training: Get a bunch of GM games. Have the computer search each >position, and find the "dominant position" (the position at the end of the PV). >Compute the eval of the position after the best move vs the position after some >other move. If the best move has a better score, then all is good. Otherwise >penalize. Then I use the simulated annealing algorithm to decide whether to >keep the change. You're conflating two independent things, the optimization algorithm and the objective function that it is optimizing. A genetic algorithm can optimize any function, and there's no reason it can't be used with comparison training. Simulated annealing can optimize any function too. In the last sentence you hint at what you're really doing, but you don't explain it fully. It sounds like you're maybe hillclimbing with each step decided by a full simulated annealing run? As you say, training against the game result only is extremely slow. That's no fun. I believe that training against GM games is not likely to work well. I'll be glad if you can prove me wrong, but here is my reason: * Computers and humans think too differently. Grandmasters know so much more about the game that they make decisions based on considerations that search and evaluation together do not understand. And conversely, the search looks at many things that grandmasters do not care about. So tuning to play moves that are statistically similar to good moves is not the same thing as tuning to play good moves. By training on each move rather than on each game, you're getting a lot more training data and can learn faster. This principle can and should be extended much, much further. For example, traditional temporal difference learning (which is not a learning algorithm, it is an objective function that many learning algorithms can use) learns from each move made in the game. KnightCap extended the idea to learn from each move predicted in the principal variation, giving better results. In principle, it should be possible to learn at least from every node in the search tree which can be assigned an exact score rather than a bound--and I am sure that there are ways to exploit the bounds too, though it's trickier. The search tree contains a huge mass of information about the program's thinking, and the more of it that can be extracted for learning, the better. Jay
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.