Author: martin fierz
Date: 07:42:48 05/30/01
Go up one level in this thread
On May 30, 2001 at 05:57:54, Steffen Jakob wrote: >it seems as if the last modifitication I made made Hossa play >worse. What I need are some good methods for testing. At the moment my >testing is rather chaotic. I observe games at ICC and if I see >something strange I try to fix it. Then I let Hossa play again at ICC >and look how it works. "to see how it works" is surely not a good way >for testing and the ICC rating isnt also reliable. all of these ideas sound good. for my checkers program i have 3 different tests which i do for every change: 1) a test set with ~100 positions (opening / midgame / endgame), for each of these, i measure the nodes to depth 13 / depth 19. and the total time for the test. depth 13 & 19 because for the shallow search, i do not fill the hashtable, while for the deeper search i fill it completely and move ordering is not dominated by hash moves any more. this test shows me how my changes in move ordering change the program's performance (or if i turn on ETC or switch from MTD to PVS or whatever). this one takes about 1 hour. 2) a set of positions where there are good but not obvious moves - not a 'mate in 9' or so, but positions where you should sacrifice material for the position, or avoid making a move which is positionally unsound. many of these positions are from games i have seen my program lose with a weak move, where the refutation is way beyond the horizon, but with a good evaluation it will not play the weak move. this test is designed to catch errors in the evaluation, although it is a bit subjective. this one takes about 1 hour too. 3) 'the proof of the pudding' - if i am satisfied with changes i have made, because of the first two tests, i run an engine match against another checkers engine (not an old version of my program - i'm afraid that this generates incest) over 288 games, that's the 144 standard checkers openings with color reversal. that's enough for good statistics. this just takes a whole day at 5s/move, so i cannot test every small change like this. but in the end i think this is the most meaningful test, and if the change fails here, i discard it. cheers martin
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.