Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Criteria for Good Test Positions = ?

Author: Mike S.

Date: 21:11:41 12/08/02

Go up one level in this thread


On December 08, 2002 at 22:15:46, Bob Durrett wrote:

>I am searching for test positions which could be used to test the positional
>chess abilities of chess engines.
>
>Is it absolutely essential that:
>
>(a) there be only one move which is best?

Yes. The best move should be clearly better than the second best.

It's not necessary that only the best move wins. Some may think it's sufficient
for a very good engine, to win (no matter how, i.e. with 2nd best moves), than
to find the best moves. I think it's not. Very good engines are those which find
the best moves more often IMO. - You see, it depends from the personal criteria
of what a good engine is (too), or even from the personal approach to chess at
all, to define what good test positions are.

People who are interested more in tournament results and rating lists, may be
satisfied with 2nd and 3rd best moves, as long as the engine has a good
percentage. I'm not.

>(b) the best move be a winning move?

Not necessarily. It could be a move which avoids a trap (don't take material
which is undefended currently), or the toughest defense although it looses too,
or a drawing idea in a bad position...

My Quicktest test suite contains examples of all these types.
http://meineseite.i-one.at/PermanentBrain/quick/quicke.htm

>Could someone please suggest a set of criteria?

(c) The first move of the solution should have a "testing character". This
means: The first move should be unusual (would commonly bad), for that it's only
chosen when the engine has seen the idea behind the move. Sacrifices are easy
examples for that.

If the solution starts with a "normal" move, which sacs nothing or refutes no
sac etc., the danger is big that it's just chosen by luck. You cannot easily
tell if the engine has seen the idea in such cases.

I prefer positions where the first move is like a proof that the engine really
has solved, without having to look at the evaluation or the variant. Including
evals and pv in the test will often lead to doubtful or unclear situations. I
recommend to avoid that for a good test.

It is very important for a good test suite, that it's understood what the
"testing character" requires. Don't think it's sufficient just to have one
clearly best move. It's not.

Probably, testing experience is required to understand that better (I'm testing
computers and programs since ~15 years).

(d) The position should not allow to *delay* the idea with in-between-moves,
i.e. by playing a primitive but strong threat the opponent must respond to
first, before playing the intended idea. This can destroy the quality of a test
position which would be ok otherwise, and is a big problem in positional tests
(where things, i.e. move order, are most often not so "forced" like in
combinations).

(e) Either a convincing solution variant should be included with the test, for
comparison (this should always be possible with tactical tests), or at least the
original continuation from the game (in cases when a good positional test was
found which is very difficult due to the (c)+(d) requirements).

People who compile a test suite, should also consider to add some explanatory
text for very difficult positions. Sometimes it's easier to see what's going on
when it's explained with words (not everybody is a 2000+ player).

(f) The test suite should have a reasonable degree of difficulty. This is very
dependant on the current level of strength and hardware speed of course (and on
the intended time per position). Neither it's good when all positions are solved
in a few seconds each, nor when only a few are solved at all.

In the Quicktest (1 min./pos.), currently 2/3 to 3/4 will be solved by good
engines. This provides a good amount of data to compare the engine's
performances, and the test can be used for one or too years... until it will be
too easy for the future soft- and hardware generations.

(g) The ambitious test suite designer will try to find positions which aren't
used already in other tests.

The test suite designer shouldn't hesitate to publish detailed results from his
suite. These should include a description of the technical conditions (CPU speed
etc.) and the solving times for the engines he has tested.

(Btw. the Quicktest description contains useful practical tips on how to test -
except for automatic testsuite processing, which I don't recommend. It's better
to watch the engines during their thinking process to learn about their
"behaviour"... of course this is not possible with very long testing times.)

Regards,
M.Scheidl


P.S. Take also a look at the positions at
http://www.andreas-schwartmann.de/chess.html#Test , and try them with various
engines, which may provide more insight in test position quality than to talk
about it in theory.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.