Author: Anthony Cozzie
Date: 14:25:14 06/28/04
Go up one level in this thread
On June 28, 2004 at 17:15:31, Jay Scott wrote: >On June 28, 2004 at 15:19:16, Anthony Cozzie wrote: >>Genetic algorithms and comparison training work completely differently. >> >>Genetic algorithms: Make a random change. Play X games. If the modified >>version wins, keep the change. Otherwise don't. The problem with this is the >>huge number of games. A small eval change might require 500 games to get right; >>even at 1 minute per game that is still 20+ hours. >> >>Comparison Training: Get a bunch of GM games. Have the computer search each >>position, and find the "dominant position" (the position at the end of the PV). >>Compute the eval of the position after the best move vs the position after some >>other move. If the best move has a better score, then all is good. Otherwise >>penalize. Then I use the simulated annealing algorithm to decide whether to >>keep the change. > >You're conflating two independent things, the optimization algorithm and the >objective function that it is optimizing. A genetic algorithm can optimize any >function, and there's no reason it can't be used with comparison training. >Simulated annealing can optimize any function too. In the last sentence you >at what you're really doing, but you don't explain it fully. It sounds like >you're maybe hillclimbing with each step decided by a full simulated annealing >run? What I do is for a set of positions A (all the various positions from the GM games) compute B (the position after the best move) and C (the set of positions after the unchosen moves). I then compute forall C: if(score(C) > score(B) penalty += (C-B); The total penalty is then the sum overall the positions. This is the function I try to optimize with SA. I can't really think of a way that it would make sense to use genetic algorithms here, unless you just mean SA with T=0. >As you say, training against the game result only is extremely slow. That's no >fun. > >I believe that training against GM games is not likely to work well. I'll be >glad if you can prove me wrong, but here is my reason: > >* Computers and humans think too differently. Grandmasters know so much more >about the game that they make decisions based on considerations that search and >evaluation together do not understand. And conversely, the search looks at many >things that grandmasters do not care about. So tuning to play moves that are >statistically similar to good moves is not the same thing as tuning to play good >moves. > >By training on each move rather than on each game, you're getting a lot more >training data and can learn faster. This principle can and should be extended >much, much further. For example, traditional temporal difference learning (which >is not a learning algorithm, it is an objective function that many learning >algorithms can use) learns from each move made in the game. KnightCap extended >the idea to learn from each move predicted in the principal variation, giving >better results. In principle, it should be possible to learn at least from every >node in the search tree which can be assigned an exact score rather than a >bound--and I am sure that there are ways to exploit the bounds too, though it's >trickier. The search tree contains a huge mass of information about the >program's thinking, and the more of it that can be extracted for learning, the >better. > > Jay Good chess is good chess. The hope is that it generates something reasonable. In general it has gotten similar values to the Crafty defaults. I don't think autotuning will ever be as good as manual tuning, but it gives you a reasonable starting place, and it also tells you whether a new parameter is good or complete trash. anthony
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.