Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: methods for testing changes

Author: Steffen Jakob

Date: 05:38:17 05/30/01

Go up one level in this thread


On May 30, 2001 at 06:41:46, Bas Hamstra wrote:

>Yes, determining if you did improve your program is problematic. Interesting
>post. But probably even if you use the below method you will find mixed results.
>It has a better FH% and a worse branching factor. It does better against A but
>worse against B. Sronger at lightning and weaker at long tc's. It plays stronger
>at the server though it does worse on the suites. Etc. Apart from that it takes
>a long time to play for example 100 standard games. You can't just play 3 games.
>Sorry if I sound a little pessimistic :-)

That isn´t pessimistic. Mixed results are ok. Its then my interpretation if I
like it or not. But at least I have some data.

Greetings,
Steffen.

>Bas.
>
>
>On May 30, 2001 at 05:57:54, Steffen Jakob wrote:
>
>>Hi all,
>>
>>it seems as if the last modifitication I made made Hossa play
>>worse. What I need are some good methods for testing. At the moment my
>>testing is rather chaotic. I observe games at ICC and if I see
>>something strange I try to fix it. Then I let Hossa play again at ICC
>>and look how it works. "to see how it works" is surely not a good way
>>for testing and the ICC rating isnt also reliable.
>>
>>I know from others that they do basically the same. I am not happy
>>with this at all. I rather would like to see a well defined strategy
>>how to test changes automatically. Here is some short brain storming
>>and I would like to get some feedback.
>>
>>- test suites (I dont like test suites very much for testing, because
>>  those positions are mostly very tactically and rather
>>  exceptional). There should be used different test suites which
>>  emphasize different themes (tactics, endgames, pawn endgames,
>>  exchange sacs, ...)
>>  data to compare: #solutions, #nodes, time
>>
>>- static eval tests: I think of a set of positions where I dont look
>>  for a best move but for a static eval value. An eval range would be
>>  assigned to each position and if the engine´s static eval is within
>>  this range, then it matches.
>>  data to compare: #solutions
>>
>>- static eval order: this is similar to the point above. Here I want
>>  to specify a set of ordered positions. The "best" position is
>>  ordered first etc.. This point is interesting. Here you can test
>>  if the engine prefers certain patterns to other patterns.
>>  data to compare: #solutions
>>
>>- effective branching factor test: given is a set of "typical"
>>  positions for opening, middlegame, endgame. The engine computes each
>>  position a certain time and writes down the effective branching
>>  factor.
>>  data to compare: effective branching factors
>>
>>- % fail highs in first move: similar as above. For the same set of
>>  "typical" positions the percentage of 1st move fail highs is measured.
>>  data to compare: %
>>
>>- self games: a reasonable number of games is played vs older versions
>>  of the same program. Which openings? learning? Maybe the openings
>>  should be given so that the results are better comparable.
>>  data to compare: score for different time controls
>>
>>- matches vs other programs: similar as above. a reasonable number of
>>  games is played vs other programs.
>>  data to compare: score for different time controls
>>
>>
>>All tests can be done automatically and produce good results which can
>>be compared directly.
>>
>>Greetings,
>>Steffen.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.