Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: methods for testing changes

Author: martin fierz

Date: 07:42:48 05/30/01

Go up one level in this thread


On May 30, 2001 at 05:57:54, Steffen Jakob wrote:

>it seems as if the last modifitication I made made Hossa play
>worse. What I need are some good methods for testing. At the moment my
>testing is rather chaotic. I observe games at ICC and if I see
>something strange I try to fix it. Then I let Hossa play again at ICC
>and look how it works. "to see how it works" is surely not a good way
>for testing and the ICC rating isnt also reliable.

all of these ideas sound good. for my checkers program i have 3
different tests which i do for every change:

1) a test set with ~100 positions (opening / midgame / endgame), for each
of these, i measure the nodes to depth 13 / depth 19. and the total time
for the test. depth 13 & 19 because for the shallow search, i do not fill
the hashtable, while for the deeper search i fill it completely and move
ordering is not dominated by hash moves any more.
this test shows me how my changes in move ordering change the program's
performance (or if i turn on ETC or switch from MTD to PVS or whatever).
this one takes about 1 hour.

2) a set of positions where there are good but not obvious moves - not
a 'mate in 9' or so, but positions where you should sacrifice material
for the position, or avoid making a move which is positionally unsound.
many of these positions are from games i have
seen my program lose with a weak move, where the refutation is way beyond
the horizon, but with a good evaluation it will not play the weak move.
this test is designed to catch errors in the evaluation, although it is
a bit subjective.
this one takes about 1 hour too.

3) 'the proof of the pudding' - if i am satisfied with changes i have
made, because of the first two tests, i run an engine match against
another checkers engine (not an old version of my program - i'm afraid
that this generates incest) over 288 games, that's the 144 standard checkers
openings with color reversal. that's enough for good statistics. this just
takes a whole day at 5s/move, so i cannot test every small change like this.
but in the end i think this is the most meaningful test, and if the change
fails here, i discard it.

cheers
  martin



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.