Computer Chess Club Archives


Search

Terms

Messages

Subject: PG Test: Introduction/FAQ

Author: Dirk Frickenschmidt

Date: 11:05:21 11/22/97


***Play the game test: Introduction/FAQ***

1. What kind of test is this?

- The most common way of testing in computer chess test using single
positions which require one or more key moves. In rare cases a small
series of moves (two or three) are the key moves.
- Many years ago I wrote an article for the German computer chess
magazine CSS where I used a Tarrasch key position from the world
championship candidates match Kortchnoi-Kasparov to see how some of the
best then available chess computers treated this positions. I was mainly
interested in the *playing* *style* of the various programs, although
*playing* *strength* was an issue as well. Each played a game from the
key position on - opening books switched off for all.
- More and more programmers have been using this testing method since
years. Most times they use key positions from lost test games to see if
a newer version of their program will play this kind of critical
position better against the same opponent than previous versions did. So
thy use such tests mainly for *reducing* *weaknesses* their programs
showed earlier. I know programmers like Dave Kissinger, Julio Kaplan and
others tested this way and think it's a common method nowadays. Isn't
it, Ed, Chris, Bob, Bruce and all the others?
- Finally John Nunn developed such a test with 10 early middlegame
positions. The main goal was to have a good mix of opening positions
(from open and tactical to closed roughly spoken) and to get a usful
impression of the overall *playing* *strength* of a program this way
from 20 games (10 positions played with black and white) against one
opponent at a time.

- My own goal now is to search for indicators of playing strength as
well as those of playing style.
I will try to cover different opening types by 15 positions (still not
much, butter perhaps a bit more appopriate than 10) adding 5 more in the
form of fundamental endgames, since endgames become more and more
important nowadays, especially in contests between similarly high rated
programs. I will take the opening positions not so much from the early
opening (like John Nunn did), but from
later phases of the opening. This seems more appropriate in the times of
huge opening books available for nearly any modern top program.

2. What are the test conditions?

Test conditions are: Tournament level (40/120) - of course anybody can
feel free to do tests on other time levels for fun, but I'm more
interested in tournament games on normal user or SSDF-like hardware - ,
(nearly) equal hardware for both programs (mainly concerning processor
speed and hashtable size). So hardware (for both) will range from P90 to
PII-300 (still rare) at the moment.

Auto232-procedure:
a) Use the "monitor modus" or however it is called (program just accepts
the moves and does not answer for either side) and give in the opening
moves up to the key position) for each program.
b) Give in the opening moves up to the key position manually.
c) Switch off all books of both programs
d) Switch off "monitor modus" and return to normal answering mode
(acomputer begins to calculate a move soon as it has received a move).
e) Press CTRL-0 for the program to start with th first move (which may
be white or black)
f) Have the two programs perform the match and save the game as
*.pgn-file (or save it in genuine program format and export it to pgn
later).

3. What do the results indicate?

- In the phase of collecting appropriate positions for my PG
("play-the-game") test one of the first goals is to see if the games
played prove the position to be exemplary enough for its purpose.

a) the positions should not be to dtermined and allow several "paths" of
play, although certain key moves and manoevres could and should be
valuable and lead to a better performance. Else one single missed key
move would already spoil the game. This would not give an impression of
the overall performance of a program having made this one wrong
decision. So a too much determined position would rather be a variation
of the common single position tests and reduce the value of the rest of
the game for evaluation drastically.
But a good mix of determined and non-determined factors will help to
reveal some of the playing strength and playing style of the engines in
combat.

b) the positions should cover different kinds of pawn structures, from
relatively "open" positions with open lines and/or diagonals to
relatively "closed" positions, from certain motives (like "minority
attack") to others and last not least from certain pawn "skeletons"
occuring in often played openings (like let's say the a4 pawn in
connection with other pawn's places in the Slav etc) to others.

c) the positions should all in all not favour a certain kind of play,
but give sharp tactical programs (Fritz, CM 5000 etc) similar chances
like more "positional" playing ones (Rebel, Hiarcs) or the calm counter
punchers (Genius etc).  I'm of course using all these rather dumb
descriptions cum grano salis, because program differences cannot be
described in too simple patterns today.
(so hi Thorsten, I fear you must finally say goodbye to the old clichee
the "two" forces of the holy light of the "knowledge" republic and the
dark forces of the empire of "the fast searchers"). :-)


4. How do we know what are "important" moves or plans in the chosen
positions?

- I usually look for a human key game from the top players which shows
some of how a position can be played and should be treated (including
comments if I can get some)
- Occasionally I will add more human games played with the same position
as instructive or entertaining material.
- Finally I will debate my conclisions with any of you here, inviting
especially stronger chess players to say a word or two...


5. Are the positions already fixed or can they be debated and exchanged?

Debating them is very welcome to me"!
Soon as anyone detects a position of similar kind, but with some more
instructive characteristics and results, I will be ready to exchange the
first one.

The first phase of the project will be mainly research ( I must admit I
like this process of research even more than the hopefully useful
results at the end). And the more people contribute to analyzing and
debating, the better the result will be.

The second phase will be that of finally determining a test set which
has proved to work well for various programs.
Then the PG test could become one more standard test showing some of a
programs characteristics by investing intermediate time (not just some
blitz games, but also not 400 tournament games or more from which only
the results will provide some useful and comparable information).

Kind regards from Dirk


P.S.

First test position for playing and debating following soon.

By the way: it lasts hours and hours to play through possible candidates
for testing positions in Chessbase (or any other game database), then
look for the above criteria, and finally starting some test games to see
if the whole thing will be instructive at all:
so no trivial task for anybody (unless perhaps Kortchnoi or Bronstein or
such guys)...



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.