Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Serius lack of good test metodogy

Author: Joachim Rang

Date: 02:11:07 06/20/04

Go up one level in this thread


On June 20, 2004 at 03:28:36, Cesar Contreras wrote:

>I have a serius problem testing improvements in my chess engine.
>
>I use test suites to find bugs, and stablish comparisions between diferent
>versions of my engine. I know that i can't use test suites to calculate ELO,
>that's clear to me.
>
>But i think there can be a test that gives me an aproximation (with an aceptable
>error margin) of the comparative strength of diferent versions of my engine with
>diferent modifications. I don't plan to cheat, that's why i'm not going to try
>to tune my engine to solve the test.
>
>I think i can make an analogy of IQ tests, there are several aspects evaluated
>in IQ tests, and diferent intelligences. Or a psicological test, that evaluate a
>lot of things about anybody. Both have an error margin and are affected to
>cheats, enviroment, time, sickness, etc. But it's up to the person who runs the
>test to try to avoid such problems.
>
>That test could give not only strength comparision, but some info, like
>performance in openning, middlegame, endgame, or maybe something more specific
>like degree of care of pawn structure, movility, king safety.
>
>Maybe there can be diferent versions of the test, each one of them with diferent
>number of positions, giving an error marging to each version.
>
>A good test suite specially oriented to chess programmers could be really great,
>again, not looking for perfect results, but aproximations usefull for us, and
>not only giving a number, but maybe several numbers of the evaluation of several
>aspects.
>
>is there any test that do that?
>or is it just a dream?

I think it is. There are several testsuites out co claim to cover certain
aspects or the whole aspects of the game. But I think no testsuite can compete
with the accurance which you get if you run a gauntlet against 10 different
opponents from 20 starting positions which means in whole 400 games. Then still
there is a big enough margin of error and a strength improvement of 5-10 Elo is
extremely difficult to verify, but I think any testsuite has quite a bigger
margin of error than such a gauntlet.

The last is valid for all basic evaluation changes. If you modify certain rare
aspects of the game (specific endgame knowledge for bishop of opposite color
endings for example) you might try testpositions to see whether your ideas are
working but I would suggest also to run a tournament from say 10 starting
positions which are bishop of opposite color endings, to see whether your change
helps in a whole ending or only in specific positions.

Changes which aim to reduce the size of the tree like move ordering or
aspiration window can perhaps be tested with testsuites efficiently, but that'
only a guess.

regards Joachim


>if it is a dream, please share your method to test your engine after
>modifications. The number of modifications we make to our engines is big, so the
>time used in testing it's a really big, maybe 90% (not sure about it, what do
>you say)
>
>There are several problems with making tournaments to get performance of the
>engine:
>- Select appropiate number of engines
>- Selection the engines
>- Number of games
>- Gauntlet, round robin or swiss
>- Time control.
>And after making the tournament, how to know aproximate error margin of the
>result? or the performance of the engine in several aspects (performance in
>oppening, middle, endgame, care of positional aspects)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.