Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Automated Tuning

Author: Jay Scott
Date: 14:06:33 01/13/98
On January 12, 1998 at 17:43:30, Stuart Cracraft wrote:
>What are the experiences of this group with automated tuning?

Evaluation functions created by machine learning work great
for othello and backgammon--they're far stronger than any
handwritten evaluators. But chess is harder, and so far
programmers have the edge over learning programs for chess
evaluators.

I have 100% confidence that machine learning will win when
we understand it well enough. But if your goal is to create
a performance program and you'd prefer to avoid high-risk
original research, then you should know that machine learning
is not going to solve your problem! Automatic tuning is good
for giving you an idea of whether a new evaluation term is
useful and what to try first for its weight, but tuning by
hand will work better. Today!

>   - temporal difference learning (Tesauro-style) or best-fit
>     (Nowatzyk/Deep-Blue style) based on master games

These two are usually seen as very different from each other.
Temporal difference methods usually learn on-line or from small
batches of games, from games played by the program itself (either
against itself or against other opponents). It's seen as
"reinforcement learning", which means that the learning
program gradually nudges the evaluator toward lower-error
states. Best-fit methods are usually applied to large batches
of games or positions, from games played by strong human players.
It's seen as a statistical model-fitting problem, which is
superficially quite different from reinforcement learning.

But I'm with you in thinking of these as slightly different
variants of basically the same thing. Here's my take:

Online methods are likely to be faster than batch, but both
have been tried and both work. I think that it's a mistake to
train solely on human games, because humans are so different
from computers. For example, I think it makes sense that
computers should give the queen a higher material value than
humans do, because computers are better at exploiting the
queen's tactical trickiness. On the other hand, it's also
hard to get good results solely from self-play games, because
the program is not likely to be good at exploiting its own
mistakes (or else it wouldn't make them!), and because you
don't get enough variety. I like the KnightCap approach of
learning from the program's games against a variety of other
opponents, but there are a bunch of other possibilities too.

>   - survival-of-fitest round-robin matches between same program
>     with slightly different coefficients or weights for evaluation
>     function

This is called a "hillclimbing" method. I'd suggest that genetic
algorithms are likely to find good evaluators faster.

This kind of thing sucks up computer time like a black hole. You
need a lot of games to tell the difference between two similar
programs, and you have to compare lots of programs to find a good
evaluator. You'll need months of time and/or many processors.

Christopher Rosin <http://www-cse.ucsd.edu/users/crosin/> used
this kind of idea to create a so-so program for 9x9 go, as part
of his PhD thesis work. He needed 2 processor-years of Cray time.

  Jay
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.