Author: Steffen Jakob
Date: 02:57:54 05/30/01
Hi all, it seems as if the last modifitication I made made Hossa play worse. What I need are some good methods for testing. At the moment my testing is rather chaotic. I observe games at ICC and if I see something strange I try to fix it. Then I let Hossa play again at ICC and look how it works. "to see how it works" is surely not a good way for testing and the ICC rating isnt also reliable. I know from others that they do basically the same. I am not happy with this at all. I rather would like to see a well defined strategy how to test changes automatically. Here is some short brain storming and I would like to get some feedback. - test suites (I dont like test suites very much for testing, because those positions are mostly very tactically and rather exceptional). There should be used different test suites which emphasize different themes (tactics, endgames, pawn endgames, exchange sacs, ...) data to compare: #solutions, #nodes, time - static eval tests: I think of a set of positions where I dont look for a best move but for a static eval value. An eval range would be assigned to each position and if the engine“s static eval is within this range, then it matches. data to compare: #solutions - static eval order: this is similar to the point above. Here I want to specify a set of ordered positions. The "best" position is ordered first etc.. This point is interesting. Here you can test if the engine prefers certain patterns to other patterns. data to compare: #solutions - effective branching factor test: given is a set of "typical" positions for opening, middlegame, endgame. The engine computes each position a certain time and writes down the effective branching factor. data to compare: effective branching factors - % fail highs in first move: similar as above. For the same set of "typical" positions the percentage of 1st move fail highs is measured. data to compare: % - self games: a reasonable number of games is played vs older versions of the same program. Which openings? learning? Maybe the openings should be given so that the results are better comparable. data to compare: score for different time controls - matches vs other programs: similar as above. a reasonable number of games is played vs other programs. data to compare: score for different time controls All tests can be done automatically and produce good results which can be compared directly. Greetings, Steffen.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.