Author: Amir Ban
Date: 14:11:15 03/20/98
"Learner" is a big word for what we have today. Whatever benefit lies in today's learners depends on a simple fact: Programs play more or less in a deterministic manner when they leave book, especially if the time controls are fixed. A learner avoids losing the same game twice by remembering the loss and varying a move. The move it varies is not the losing move, and probably not even a bad move in the ordinary sense of the word. It's just that this line leads the program to play consistently against some specific line to a loss. Similarly, a learner can repeat a win against a non-learner by relying on it to repeat a losing line. The same idea is behind the trick of adding autoplayed games to a book, as is rumoured to have been done by MCPro 7.1 and used very effectively against some including Hiarcs. The victim, being deterministic, was known beforehand to follow a losing line, and so it did. To be effective, the autplayed games should have been played on the same machine with the same time controls as would be used in actual competition (SSDF, in this case). I don't think that when this debate took place anyone noticed that the autoplayed games were probably also played with permanent brain OFF. They had to, because otherwise the timing would not have been right, and the exact line may not be repeated. The general consensus here is that against a learner, you need another learner. I don't think so at all. A learner would be an overshoot. This can be handled by simpler means, that in addition need no long-term memory as a learner does. The solution is, in one word: VARY. Introduce variation through randomization. Don't play the same game twice, or at least make this unlikely. Do this especially when you have recently left book. Every programmer can probably think of half a dozen methods to do this, but I'll anyway try to give some advice: Adding a small random value to the evaluation is one obvious way to go, and may work. Drawbacks are that this may be somewhat expensive, and that there may be unwanted complications if the same position doesn't evaluate consistently in a single search-tree. An alternative, that I would recommend, is to randomize some eval coeffiecients before starting each move, and use those values consistently in the search-tree. Some coefficients have only a small effect on the best move chosen, and some are too important to be varied, but my guess is that the terms controlling mobility, development and centre control, for example, may be slightly varied with hardly any effect on strength, and are almost guaranteed to produce a different move at least once in say 6 moves, especially in the sort of positions that you have out of book. Basically that's all you need. Another reasonable possibility is to sometime play the second-best move, but only if it's almost as good as the best move. This is guaranteed to produce a variation, though slightly expensive to compute. This is very easy to test. Play automatic games from several fixed positions and see how many duplicates you get. Use fixed ply depths (timing is another source of random variation that you don't want to measure here). You don't need to avoid duplicates completely, just get them down to a level where you believe a learner would be wasting its efforts. Remember that the main source for variation is the book. If you have a large book, you are much safer than with a small book (judging by the MCPro success against Hiarcs, Hiarcs has a small tournament book). There is still danger that someone will take you out of book early, and make you follow a losing line again and again, so I propose to measure how safe you are this way: When you are following book, keep track of the probability of reaching the current position. When you are out of book, look at the probability. If it's below some threshold, you are fine and do nothing. If you are above the threshold, take evasive action through randomization. You should have a rough idea of how effective your evasive action is, and this allows you to adjust the probability by some factor with each move. When you are below your threshold, relax and resume your normal mode. This procedure is "nice". It is purely defensive. If you feel like being aggressive, go ahead and implement a learner. But learners are currently not "nice", as some people here have pointed out. They don't really learn anything serious about anything, but just try to exploit an incidental feature of today's programs (determinism). Of course, if you meet a learner that really understands something, watch out ! If someone can figure out that you play endgames weakly, or that you are better in attacking the king than in anything else, or that your search is ineffective in some situations, and furthermore knows what to do with such information, you are still in trouble, but for computers that's a long way down the road. Amir
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.