Author: Bas Hamstra
Date: 03:41:46 05/30/01
Go up one level in this thread
Yes, determining if you did improve your program is problematic. Interesting post. But probably even if you use the below method you will find mixed results. It has a better FH% and a worse branching factor. It does better against A but worse against B. Sronger at lightning and weaker at long tc's. It plays stronger at the server though it does worse on the suites. Etc. Apart from that it takes a long time to play for example 100 standard games. You can't just play 3 games. Sorry if I sound a little pessimistic :-) Bas. On May 30, 2001 at 05:57:54, Steffen Jakob wrote: >Hi all, > >it seems as if the last modifitication I made made Hossa play >worse. What I need are some good methods for testing. At the moment my >testing is rather chaotic. I observe games at ICC and if I see >something strange I try to fix it. Then I let Hossa play again at ICC >and look how it works. "to see how it works" is surely not a good way >for testing and the ICC rating isnt also reliable. > >I know from others that they do basically the same. I am not happy >with this at all. I rather would like to see a well defined strategy >how to test changes automatically. Here is some short brain storming >and I would like to get some feedback. > >- test suites (I dont like test suites very much for testing, because > those positions are mostly very tactically and rather > exceptional). There should be used different test suites which > emphasize different themes (tactics, endgames, pawn endgames, > exchange sacs, ...) > data to compare: #solutions, #nodes, time > >- static eval tests: I think of a set of positions where I dont look > for a best move but for a static eval value. An eval range would be > assigned to each position and if the engine“s static eval is within > this range, then it matches. > data to compare: #solutions > >- static eval order: this is similar to the point above. Here I want > to specify a set of ordered positions. The "best" position is > ordered first etc.. This point is interesting. Here you can test > if the engine prefers certain patterns to other patterns. > data to compare: #solutions > >- effective branching factor test: given is a set of "typical" > positions for opening, middlegame, endgame. The engine computes each > position a certain time and writes down the effective branching > factor. > data to compare: effective branching factors > >- % fail highs in first move: similar as above. For the same set of > "typical" positions the percentage of 1st move fail highs is measured. > data to compare: % > >- self games: a reasonable number of games is played vs older versions > of the same program. Which openings? learning? Maybe the openings > should be given so that the results are better comparable. > data to compare: score for different time controls > >- matches vs other programs: similar as above. a reasonable number of > games is played vs other programs. > data to compare: score for different time controls > > >All tests can be done automatically and produce good results which can >be compared directly. > >Greetings, >Steffen.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.