Author: Jay Scott
Date: 16:29:42 02/06/98
To create a test suite we need a bunch of positions that we're pretty sure we understand. One day I realized there's an abundant source: opening books. I can think of lots of kinds of test suites that can be made from opening positions: (1) There are many positions where a set of best moves is known. For example, in the initial position the best moves are considered to be 1. e4, 1. d4, 1. c4 and 1. Nf3. It's not obvious that 1. Nc3 is worse (and some programs will play 1. Nc3 with opening book turned off). Of course, you may want to include only positions with a single best move, but that's not necessary. (2) There are many unbalanced positions which are thought to be dynamically equal. A test suite could include positions like the King's Gambit (1. e4 e5 2. f4) or the Ruy Lopez Exchange Variation (1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Bxc6) where the sides accept different kinds of advantage, and score a program as "successful" if it evaluates the positions as approximately equal. Of course that means the programs have to return genuine scores, and it'll be hard to get consistent scoring between programs. But this could be a handy way to evaluate changes to a single program. (3) The problem of comparing scores can be solved by creating a test suite of pairs of positions. After 1. e4 e5, the move 2. f4 is at least as good as 2. d4, so we could feed both positions to a program and score it as "successful" if it evaluates the position after 2. f4 at least as highly. The program still has to return genuine scores. Or we could pair any position known to be bad with any position considered to be equal, and so on. (4) A test suite could be constructed automatically from a game database: if people reach a position frequently and the results are about even, we can guess that it's an even position and accept the best moves from the database. Uneven positions may be interesting too. This kind of automatically-generated test suite may not be as reliable as a hand-made one, but it's easy to create. Unlike automatically-generated opening books, we don't have to include every position, and we can narrow the suite down to positions that have convincing statistics. For programs that have played enough games, like crafty, the game database could be restricted to its own games to guarantee relevance--here the test might be to see whether the program's evaluation reflects the outcome statistics of the position. A test suite made from opening positions would naturally cover both tactical and positional considerations, unlike most test suites. Because the positions have been deeply investigated by many people, there won't be as many disagreements and errors (but you'll never get away from errors altogether). Disadvantages to test suites made from opening books: - There won't be many endgame positions. :-) - Programs can already play the positions well, with their opening books. So arguably the test positions are all irrelevant. - Human openings are made for humans. Maybe the best moves for a human to play aren't the best for a program. - To create one by hand, you'll need an opening expert. Jay
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.