Computer Chess Club Archives


Search

Terms

Messages

Subject: There is no Testsuite for Engines during their Time of Development

Author: Rolf Tueschen

Date: 03:48:38 06/21/04

Go up one level in this thread


On June 20, 2004 at 03:28:36, Cesar Contreras wrote:

>I have a serius problem testing improvements in my chess engine.
>
>I use test suites to find bugs, and stablish comparisions between diferent
>versions of my engine. I know that i can't use test suites to calculate ELO,
>that's clear to me.
>
>But i think there can be a test that gives me an aproximation (with an aceptable
>error margin) of the comparative strength of diferent versions of my engine with
>diferent modifications.

You write you think there can be a test ... but this is exactly what is wrong.
There is none.


>I don't plan to cheat, that's why i'm not going to try
>to tune my engine to solve the test.
>
>I think i can make an analogy of IQ tests, there are several aspects evaluated
>in IQ tests, and diferent intelligences. Or a psicological test, that evaluate a
>lot of things about anybody. Both have an error margin and are affected to
>cheats, enviroment, time, sickness, etc. But it's up to the person who runs the
>test to try to avoid such problems.


You write that you think that you can make an analogy to IQ tests ... but this
is also wrong.



>
>That test could give not only strength comparision, but some info, like
>performance in openning, middlegame, endgame, or maybe something more specific
>like degree of care of pawn structure, movility, king safety.


Yes, this is exactly what all the programmers are dreaming about, but it doesn't
exist and if you think you could make such a test then you are misleaden.



>
>Maybe there can be diferent versions of the test, each one of them with diferent
>number of positions, giving an error marging to each version.
>
>A good test suite specially oriented to chess programmers could be really great,
>again, not looking for perfect results, but aproximations usefull for us, and
>not only giving a number, but maybe several numbers of the evaluation of several
>aspects.


You write "not exact but approximately" but this is exactly the problem, how
approximate that you can make a decision that is sound?


>
>is there any test that do that?
>or is it just a dream?


It is a dream.


>if it is a dream, please share your method to test your engine after
>modifications. The number of modifications we make to our engines is big, so the
>time used in testing it's a really big, maybe 90% (not sure about it, what do
>you say)

Oh, they won't tell you this and that is the same as with their code. Nobody
will tell you.




>
>There are several problems with making tournaments to get performance of the
>engine:
>- Select appropiate number of engines
>- Selection the engines
>- Number of games
>- Gauntlet, round robin or swiss
>- Time control.
>And after making the tournament, how to know aproximate error margin of the
>result? or the performance of the engine in several aspects (performance in
>oppening, middle, endgame, care of positional aspects)


You must realise that being a programmer of a new engine requires also talents
in testing. That's the whole secret. You can try to improve your actual
knowledge. Don't wait on publications with the solution of your problem. The
good programmers have their own methods and the not so good will use existing
test suites with less success in tournaments for sure. All the good programmers
here have already stated that they don't work with test suites, not because they
are dickheaded but because test suites don't "work".

Finally let me inform you without being arrogant that there is no lack of good
test methodology because test theory is a perfect field at universities but what
you miss is a good test suite and the answer is - - - there is no such test
suite because it's a contradiction in itself. You build up a chess engine for
that it could play chess. But chess isn't just puzzle solving. Worse - - engines
that are good at such test suites can well be weak in tournaments. I'm talkingt
about engines whose playing strength is still unknown! The many testers around
who run test suites with famous chess engines work with *already* perfectly
tuned "well playing" tournament engines.

The last paragraphe is a bit difficult to understand so you should ask if you
have questions. Like all the readers of course. For the experts I want to say
that this little difference above is for many computerchess fans the main reason
to waste their precious time. They do really think that with their test suite
teststing they can reveil something important which is nonsense because their
testing would function at all if you, the experts, hadn't prepared your new
engine versions so well. It would help a lot if you could give a couple of
informations about the possible integration of testsuite solutions in your
engines...

Like SMK from SHREDDER declared "I could add a couple of test stuff so that
nobody but me on this Earth could detect it". We must be thankful that at least
sometimes someone is making some hints, but I doubt that they are fully
understandable for lays. So they are surprised and satisfied that "their" test
results did perfectly resemble the ones from tornament tests... Yes, what a
surprise! ;)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.