Author: Jay Scott
Date: 14:06:33 01/13/98
Go up one level in this thread
On January 12, 1998 at 17:43:30, Stuart Cracraft wrote: >What are the experiences of this group with automated tuning? Evaluation functions created by machine learning work great for othello and backgammon--they're far stronger than any handwritten evaluators. But chess is harder, and so far programmers have the edge over learning programs for chess evaluators. I have 100% confidence that machine learning will win when we understand it well enough. But if your goal is to create a performance program and you'd prefer to avoid high-risk original research, then you should know that machine learning is not going to solve your problem! Automatic tuning is good for giving you an idea of whether a new evaluation term is useful and what to try first for its weight, but tuning by hand will work better. Today! > - temporal difference learning (Tesauro-style) or best-fit > (Nowatzyk/Deep-Blue style) based on master games These two are usually seen as very different from each other. Temporal difference methods usually learn on-line or from small batches of games, from games played by the program itself (either against itself or against other opponents). It's seen as "reinforcement learning", which means that the learning program gradually nudges the evaluator toward lower-error states. Best-fit methods are usually applied to large batches of games or positions, from games played by strong human players. It's seen as a statistical model-fitting problem, which is superficially quite different from reinforcement learning. But I'm with you in thinking of these as slightly different variants of basically the same thing. Here's my take: Online methods are likely to be faster than batch, but both have been tried and both work. I think that it's a mistake to train solely on human games, because humans are so different from computers. For example, I think it makes sense that computers should give the queen a higher material value than humans do, because computers are better at exploiting the queen's tactical trickiness. On the other hand, it's also hard to get good results solely from self-play games, because the program is not likely to be good at exploiting its own mistakes (or else it wouldn't make them!), and because you don't get enough variety. I like the KnightCap approach of learning from the program's games against a variety of other opponents, but there are a bunch of other possibilities too. > - survival-of-fitest round-robin matches between same program > with slightly different coefficients or weights for evaluation > function This is called a "hillclimbing" method. I'd suggest that genetic algorithms are likely to find good evaluators faster. This kind of thing sucks up computer time like a black hole. You need a lot of games to tell the difference between two similar programs, and you have to compare lots of programs to find a good evaluator. You'll need months of time and/or many processors. Christopher Rosin <http://www-cse.ucsd.edu/users/crosin/> used this kind of idea to create a so-so program for 9x9 go, as part of his PhD thesis work. He needed 2 processor-years of Cray time. Jay
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.