Author: Dirk Frickenschmidt
Date: 02:34:44 12/13/97
***Play the game test: Introduction/FAQ***
1. What kind of test is this?
- The most common way of testing in computer chess test using single
positions which require one or more key moves. In rare cases a small
series of moves (two or three) are the key moves.
- Many years ago I wrote an article for the German computer chess
magazine CSS where I used a Tarrasch key position from the world
championship candidates match Kortchnoi-Kasparov to see how some of the
best then available chess computers treated this positions. I was mainly
interested in the *playing* *style* of the various programs, although
*playing* *strength* was an issue as well. Each played a game from the
key position on - opening books switched off for all.
- More and more programmers have been using this testing method since
years. Most times they use key positions from lost test games to see if
a newer version of their program will play this kind of critical
position better against the same opponent than previous versions did. So
thy use such tests mainly for *reducing* *weaknesses* their programs
showed earlier. I know programmers like Dave Kissinger, Julio Kaplan and
others tested this way and think it's a common method nowadays. Isn't
it, Ed, Chris, Bob, Bruce and all the others?
- Finally John Nunn developed such a test with 10 early middlegame
positions. The main goal was to have a good mix of opening positions
(from open and tactical to closed roughly spoken) and to get a usful
impression of the overall *playing* *strength* of a program this way
from 20 games (10 positions played with black and white) against one
opponent at a time.
- My own goal now is to search for indicators of playing strength as
well as those of playing style.
I will try to cover different opening types by 15 positions (still not
much, butter perhaps a bit more appopriate than 10) adding 5 more in the
form of fundamental endgames, since endgames become more and more
important nowadays, especially in contests between similarly high rated
programs. I will take the opening positions not so much from the early
opening (like John Nunn did), but from
later phases of the opening. This seems more appropriate in the times of
huge opening books available for nearly any modern top program.
2. What are the test conditions?
Test conditions are: Tournament level (40/120) - of course anybody can
feel free to do tests on other time levels for fun, but I'm more
interested in tournament games on normal user or SSDF-like hardware - ,
(nearly) equal hardware for both programs (mainly concerning processor
speed and hashtable size). So hardware (for both) will range from P90 to
PII-300 (still rare) at the moment.
Auto232-procedure:
a) Use the "monitor modus" or however it is called (program just accepts
the moves and does not answer for either side) and give in the opening
moves up to the key position) for each program.
b) Give in the opening moves up to the key position manually.
c) Switch off all books of both programs
d) Switch off "monitor modus" and return to normal answering mode
(acomputer begins to calculate a move soon as it has received a move).
e) Press CTRL-0 for the program to start with th first move (which may
be white or black)
f) Have the two programs perform the match and save the game as
*.pgn-file (or save it in genuine program format and export it to pgn
later).
3. What do the results indicate?
- In the phase of collecting appropriate positions for my PG
("play-the-game") test one of the first goals is to see if the games
played prove the position to be exemplary enough for its purpose.
a) the positions should not be to determined and allow several "paths"
of play, although certain key moves and manoevres could and should be
valuable and lead to a better performance. Else one single missed key
move would already spoil the game. This would not give an impression of
the overall performance of a program having made this one wrong
decision. So a too much determined position would rather be a variation
of the common single position tests and reduce the value of the rest of
the game for evaluation drastically.
But a good mix of determined and non-determined factors will help to
reveal some of the playing strength and playing style of the engines in
combat.
b) the positions should cover different kinds of pawn structures, from
relatively "open" positions with open lines and/or diagonals to
relatively "closed" positions, from certain motives (like "minority
attack") to others and last not least from certain pawn "skeletons"
occuring in often played openings (like let's say the a4 pawn in
connection with other pawn's places in the Slav etc) to others.
c) the positions should all in all not favour a certain kind of play,
but give sharp tactical programs (Fritz, CM 5000 etc) similar chances
like more "positional" playing ones (Rebel, Hiarcs) or the calm counter
punchers (Genius etc). I'm of course using all these rather dumb
descriptions cum grano salis, because program differences cannot be
described in too simple patterns today.
(so hi Thorsten, I fear you must finally say goodbye to the old clichee
the "two" forces of the holy light of the "knowledge" republic and the
dark forces of the empire of "the fast searchers"). :-)
4. How do we know what is important in the chosen positions?
- I usually look for a human key game from the top players which shows
some of how a position can be played and should be treated (including
comments if I can get some)
- Occasionally I will add more human games played with the same position
as instructive or entertaining material.
5. Are the positions already fixed or can they be debated and exchanged.
Debating them is very welcome to me. Soon as anyone detects a position
of similar kind, but with more instructive characteristics and results,
I will be ready to exchange it.
The first phase of the project is mainly research. And the more people
contribute to analyzing and debating, the better the result will be.
As has already happened to Position 1, try 1: after having played enough
testgames it proved to favour white too much, so it will be replaced.
The second phase will be that of finally determining the whole test set
with the positions that have proved to work well for various programs.
Then the PG test could become one more standard test showing some of a
programs characteristics by investing intermediate time (not just some
blitz games, but also not 400 tournament games or more from which only
the results will provide some useful and comparable information).
Kind regards from Dirk
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.