Computer Chess Club Archives


Search

Terms

Messages

Subject: What to do against a learner

Author: Amir Ban

Date: 14:11:15 03/20/98



"Learner" is a big word for what we have today. Whatever benefit lies in
today's learners depends on a simple fact: Programs play more or less in
a deterministic manner when they leave book, especially if the time
controls are fixed.

A learner avoids losing the same game twice by remembering the loss and
varying a move. The move it varies is not the losing move, and probably
not even a bad move in the ordinary sense of the word. It's just that
this line leads the program to play consistently against some specific
line to a loss. Similarly, a learner can repeat a win against a
non-learner by relying on it to repeat a losing line.

The same idea is behind the trick of adding autoplayed games to a book,
as is rumoured to have been done by MCPro 7.1 and used very effectively
against some including Hiarcs. The victim, being deterministic, was
known beforehand to follow a losing line, and so it did. To be
effective, the autplayed games should have been played on the same
machine with the same time controls as would be used in actual
competition (SSDF, in this case). I don't think that when this debate
took place anyone noticed that the autoplayed games were probably also
played with permanent brain OFF. They had to, because otherwise the
timing would not have been right, and the exact line may not be
repeated.

The general consensus here is that against a learner, you need another
learner. I don't think so at all. A learner would be an overshoot. This
can be handled by simpler means, that in addition need no long-term
memory as a learner does. The solution is, in one word: VARY.

Introduce variation through randomization. Don't play the same game
twice, or at least make this unlikely. Do this especially when you have
recently left book. Every programmer can probably think of half a dozen
methods to do this, but I'll anyway try to give some advice:

Adding a small random value to the evaluation is one obvious way to go,
and may work. Drawbacks are that this may be somewhat expensive, and
that there may be unwanted complications if the same position doesn't
evaluate consistently in a single search-tree. An alternative, that I
would recommend, is to randomize some eval coeffiecients before starting
each move, and use those values consistently in the search-tree. Some
coefficients have only a small effect on the best move chosen, and some
are too important to be varied, but my guess is that the terms
controlling mobility, development and centre control, for example, may
be slightly varied with hardly any effect on strength, and are almost
guaranteed to produce a different move at least once in say 6 moves,
especially in the sort of positions that you have out of book. Basically
that's all you need.

Another reasonable possibility is to sometime play the second-best move,
but only if it's almost as good as the best move. This is guaranteed to
produce a variation, though slightly expensive to compute.

This is very easy to test. Play automatic games from several fixed
positions and see how many duplicates you get. Use fixed ply depths
(timing is another source of random variation that you don't want to
measure here). You don't need to avoid duplicates completely, just get
them down to a level where you believe a learner would be wasting its
efforts.

Remember that the main source for variation is the book. If you have a
large book, you are much safer than with a small book (judging by the
MCPro success against Hiarcs, Hiarcs has a small tournament book). There
is still danger that someone will take you out of book early, and make
you follow a losing line again and again, so I propose to measure how
safe you are this way:

When you are following book, keep track of the probability of reaching
the current position. When you are out of book, look at the probability.
If it's below some threshold, you are fine and do nothing. If you are
above the threshold, take evasive action through randomization. You
should have a rough idea of how effective your evasive action is, and
this allows you to adjust the probability by some factor with each move.
When you are below your threshold, relax and resume your normal mode.

This procedure is "nice". It is purely defensive. If you feel like being
aggressive, go ahead and implement a learner. But learners are currently
not "nice", as some people here have pointed out. They don't really
learn anything serious about anything, but just try to exploit an
incidental feature of today's programs (determinism).

Of course, if you meet a learner that really understands something,
watch out ! If someone can figure out that you play endgames weakly, or
that you are better in attacking the king than in anything else, or that
your search is ineffective in some situations, and furthermore knows
what to do with such information, you are still in trouble, but for
computers that's a long way down the road.

Amir





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.