Author: Dana Turnmire
Date: 22:53:52 05/08/01
Here is an interesting article from CCR Vol. 3 No. 2 (Late 1992/Early 1993) by Larry Kaufman. I thought it was interesting what he had to say about "default settings." "Don Daily (professional programmer and 1800 ballpark chess amateur) and myself (amateur programmer but expert on computer algorithms, rating 2486) have worked together now for several years developing four distinct chess programs (Rex, Alpha, Socrates, and Titan), each clearly better than its predecessor when allowance is made for the fact that Alpha and Titan are written in "C" while Rex and Socrates are in Assembly language, which is faster. I thought it might be interesting to readers to get an idea of how we go about improving our programs. First of all, to determine if a change is an improvement, we need some method for judging this. If the change is simply a small scoring change to correct some poor move we have observed, it must be left to my judgement, as such changes are apt to be worth only a fraction change of a rating point, which could never be measured. But let's assume we are talking about some fairly major change to the program. The first thing we do after such a change is to run it on a problem set, such as the one published in this and the preceding issue of CCR. This gives us some idea of the effects of the change, but we must be very careful. The problem is that some changes may speed up the solution of most or even all problems, yet have a subtle degrading effect on the positional play. Or, it may slow down most of the problems a bit while subtly improving the positional play. So the next step is automated testing, of which there are two types. The first method we call "self-testing," in which the program with the change plays against the same program without it, all on one computer. We use a fixed set of 100 openings that stop after five moves per side. Each version gets white once and black once in each opening, so if time permits a two hundred game match may be played. Since the computer is fully dedicated to whichever side is on move, no thinking on the opponent's time is possible. Presumably this should affect both sides equally, but this may not always be true. If the nature of the change is such as to affect the speed of the program, we must test it on an equal time basis; if it does not affect speed, we often test on fixed depth searches to minimize the luck factor. Since we have five 486 computers between us, if we set them at different levels we can play a thousand games to evaluate a change without ever having to touch anything once the test has begun. Ideally, it would be nice to test at tournament time controls, but since we always have so many ideas to test we find it necessary to do most testing at levels ranging from 5" to 30" per move. The luck factor is so great that we must normally run several hundred games to have any confidence in the result, though some changes are dramatic enough to prove themselves quickly. Some people have argued that self-testing is not very reliable because the program may not know how to punish its own positional errors, and because a change that speeds up the program (at some price) may score better against its "brother" than it should. We have not generally found these factors to be much of a problem, but we do have an alternative method of testing. We call this second method "auto-testing." We cable two identical computers together, and use a "referee" program that Don wrote to allow our program to play a totally different one. Since he must write a separate program for each opponent we wish to test against, we only use a few opponents. At this time we can test against MChess, Zarkov, Fritz (1), and against our own earlier programs (i.e. Titan vs. Socrates). This has the advantage of allowing both programs to think on opponent's time, and is not subject to the criticisms mentioned above against self-testing. The disadvantage is that to judge whether a change is beneficial requires twice as many games since each version must be auto-tested against the same opponent. I don't know which way is really better overall; we use both methods. Now let's talk about how we actually try to improve the program. One way is by "rule-base" changes. Don created a chess programming language that allowed me to write hundreds of rules that the program processes before beginning its search. Based on these rules it decides where it would like to try to place its pieces, other things being equal. For example, if there are a lot of pieces on the board, the program is heavily penalized for allowing its king to be brought out to the center of the board, but if it is an edngame, it is rewarded for centralizing its king. There are bonuses for centralizing knights, for putting bishops on long diagonals, for occupying holes, for bringing the queen near to the enemy king, for centralizing pawns (i.e. capture towards the center), and countless other bonuses and penalties. If I observe bad play that can be easily corrected by this rulebase, I do so. However, other parts of the evaluation, such as pawn structure and mobility, must be calculated at the end of each variation searched, and so can only be changed by Don since they are not part of the "rule-base." There are also many parameters of the search itself which I can adjust and test at will. For example, the number of selective plies may be varied, the degree of selectivity on each selective ply may be set, checks in the quiescence search may be turned on or off, certain short-cuts we take may be turned on or off, and so on. We expect to leave in enough of these options so that those who purchase Socrates may have the fun of self-testing various possibilities themselves. Perhaps someone will prove that our default settings are in fact not the best! We always have the fear that some choice we make based on relatively fast games will prove to be the wrong choice at 40/2, so if anyone who gets Socrates wants to self-test any parameters at 2 or 3 minute levels on a 486 we would like to hear the result of the 200 games! Testing can be stopped after any game and re-started when the machine is free at the game number corresponding the number of games already played. We occasionally try radical changes to the program if we can see some rationale. Often this means some sort of extension for some certain class of moves, such as certain recaptures or replies to certain threats. Usually the extension does not prove to be a benefit due to the general slowing down of the program it entails, but a few extensions have proven their worth and are retained. We keep hoping to hit on the one quick change that will add a hundred points to the program, but the reality is that our progress has mostly been achieved in tiny increments of 2-5 points at a time, which are really too small to be measured with any certainty. But they do add up!"
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.