Computer Chess Club Archives


Search

Terms

Messages

Subject: Proposal: New testing methods for SSDF (1)

Author: Jeroen Noomen

Date: 07:42:39 04/13/98


A few days ago I have been asked by CCC (like most of you) to enter the
discussion about Fritz 5. Since a few weeks I have Fritz 5 on my PC, too
short a period to come to any definite conclusion.

IMO there is no doubt about it that Fritz 5 is a great program: a lot of
nice
features, great graphics and a very strong chessprogram. One must
congra-
tulate ChessBase on doing a fine job here.

Seeing the published games and playing games myself (Jeroen vs. Fritz 5,
autoplayergames vs. other engines) I am sure Fritz 5 is one of the top
programs
at the moment. It has a pure tactical-based style, which is difficult to
handle
for human players. Out of any position Fritz seems to be able to confuse
matters
and to come up with tactical resources. Although I am of the opinion
that
'knowledge-based'  chessprograms have the future, there is no question
about
it that Fritz 5 plays a very active way of chess, trying to put its
pieces in the
right places and searching for tactical possibilities. I am disagreeing
on any
comment that says that Fritz 5 plays bad chess, positionally weak and so
on.
With any program it is possible to come up with positions in which this
particular program plays weak moves. ALL programs suffer from the fact
that
they still are no grandmasters in positional play. To come at this
point: IMO
it is still far, far away before any chessprogram will reach the level
of understanding
the game like f.e. Kasparov, Kramnik or Anand.

I read a lot of comments about the Fritz - SSDF subject. A few of them
are:

1. Fritz 5 plays only with 64 MByte of Hash
2. Fritz 5 is allowed to play with endgame databases
3. Fritz 5 is using a special powerbook
4. Fritz 5 is tuned on the best programs in the SSDF-list
5. Fritz 5 cannot play with AUTO232

I don't know why these subjects are brought up, there seems to be
nothing wrong
with this. IMO this is all very normal and there is nothing 'illegal' in
it. F.e. take
the point of tuning: Are not all programmers doing this? It's absolutely
nonsense
to attack ChessBase for something that others are doing for years!

There ar however a few points that should be cleared: If a neutral
organisation
like SSDF are testing chessprograms, the conditions for these tests
should be
as equal as possible. That means: If you present results based on
Pentium 200
MMX, these should be the SAME Pentiums, with the same configuration.

Another point: the ChessBase autoplayer. One can ask the following
questions:

1. Why does a commercial organisation make an investment in a project
(like f.e.
    an autoplayer) if something similar already exists?
2. Why comsume a lot of time and money to build a piece of software,
instead
    of only writing a AUTO232 driver, which is far easier and costs far
less?
3. As I graduated in commercial economics at the Highschool for Business
    Economics, I have learned that a commercial organisation only makes
an
    investment if they expect profit out of it: It should pay itself
back, or better:
    The investment should make more profits.

A lot of speculating has been done on this subject, but coming to the
main
point of this topic I would like to ask all the programmers on CCC the
following
questions:

a) Is it possible that an autoplayer can influence other programs? I.e.
overruling
   data, stopping some necessary routines etc.
b) Is it possible to make an autoplayer that is advantageous for one
side? I.e.
    makes the results for one side better than they really are.

I want to point out that I am absolutely no expert on this subject,
therefore I ask
the authors of chess programs to give answers to these questions.
The third question:

c) If the answer to a) and b) is 'YES', would you agree to have your
program
    tested against such autoplayers?

I have the feeling that it was wrong from SSDF to accept this way of
testing,
WITHOUT consulting other programmers if they agree or not. A great risk
is
that now programmers are removing the AUTO232 software, or 'worse': are
starting to make their own autoplayer. This leads to 6 different
autoplayers
(or more) and the results of SSDF would be getting more and more
unclear.
Furthermore: Instead of taking time and efforts to improve the playing
strength
and features of a program, all programmers have to spend a huge amount
of
time in autoplayers, booktuning, booklearners and so on. Inevitable
conclusion:
The real playing strength improvements are becoming less and less,
instead
a lot of 'statistical imrpovements' are made. F.e. who is the best in
copying
won games to gain more Elo-points. (Note that older programs do not have
any
defence to such treatments. If you take program A without learner and
the
very same program with a learner, than the last one will win, although
it is the
same program and absolutely not stronger at all. But statistically
everybody will
say that the second program is stronger).

To make an end to speculations, above mentioned developments and the
recent
'bookwar' (as I call it myself), I would like to come up with two
proposals of new
test methods, that could be used to determine the real differences in
playing
strength between chess programs. I do not want to imply these tests are
the
'coup the grace', but still it might be useful as an alternative, to
prevent the
influence from killerlines, booklearners and so on.

Proposal 1:  The universal SSDF openingbook
------------------------------------------------------------------

* Out of - let's say 500,000 - grandmaster games the SSDF makes a
universal
   openingbook.
* This openingbook will not be published or made available to the chess
 programmers.
* All tested programs will have to use the universal book, all testgames
and
  matches will be played only with this book.
* Each match consists of a predefined number of games, let's say 40. All
matches
  should consist of 40 played games.
* Doubles are not counted.
* The configuration should be exactly the same for both opponents. A
standard
  can be agreed upon, f.e. Pentium 200 MMX with 64 MByte of memory.

Under these conditions older programs can participate as well, as the
results of
improved openingbooks, bookkilling, and booklearners are ruled out.
Although a
disadvantage could be that the new programs - with learners - could
learn which
lines in the new book are suitable and which are not.

One can argue that a universal openingbook violates the fact that the
openingbook
is a part of the chessprogram. I agree to that, but I want to point out
that this
testmethod is meant to compare playing strength of chess programs, not
opening
books.

Proposal 2:  50 openingpositions, each program plays one game with White
and
                  one with Black in every openingposition.
-------------------------------------------------------------------------------------------------------------------

In Computer Schach & Spiele (CSS) the Nunn testpositions were
introduced, as
well as the SeE-test (see f.e. the excellent article in the latest CSS
by Arno
Nickel, page 45-49). Some six years ago I had been making a comparable
test-
set, to test the several ChessMachine versions of TASC (e.g. The King
and
ChessMachine Madrid).

IMO the current Nunn- and SeE-tests are a very interesting approach to
compare
playing strength of chessprograms, the disadvantage being that these two
tests
contain too few testpositions. In the ChessMachine-period my test used
to have
50 testpositions, containing 50 different testpositions following the
next rules:

* All popular and theoretically interesting openinglines are in this
test
* Less popular lines are also included
* The test contains open, closed and semi-closed positions, as well as
positions
   with isolated pawns, blocked structures etc.
* The endposition of each test should not give one side a clear
advantage, to
  prevent 1-1 results in that line all the time.
* In a match both programs play each testposition as White and as Black.

The advantages are clear: Booktuning, booklearning and other stuff are
out of
the question and because a valid testrange is used, containing positions
that
appear often in tournaments between strong players, such a test will
provide
good information about the real playing strength of a chess program.

It is also possible to compare different versions of the same
programmer, f.e.
matches between MCP 5,6,7   Genius 3,4,5   Fritz 3,4,5  and Hiarcs 4,5,6
can
be held. This way you can see in which way a program has been improved
(or
not).

Of course there are disadvantages as well: 100 games 40/2 hours take a
lot of
time. Before it is possible to make a new Elo-list following this
testmethod it
will probably take a year or something....

To make it possible we could make the next rules:

1. The 50 positions are from modern theory and will be chosen by a
strong
    chess player.
2. These positions will not be published or given to one of the
competitors.
3. Each match the both competing programs have to play each position as
    White and as Black.
4. Openingbooks are turned off, all games have to start from the
testposition.
5. At least 5 different matches (500 games) have to be played before a
program
    appears on the ratinglist.

This way we can provide a ratinglist that is free from influences caused
by booktricks, giving a good example of the allround playing strength of
a chessprogram. Also statistics can be made which positions a program
plays well and which not. (See also Arno Nickel's article in CSS).

At last I want to present a possible test of 50 testpositions, just to
give an
example how I made them 6 years ago for testing the ChessMachine. (see
below)

I want to stress the fact that sometimes people forget what an enormous
job our
chess friends in Sweden are doing for the sake of computer chess. This
is all
done by volunteers, free of charge and devoting parts of their lives
just to give
people insights in the rankings of chess programs. A great reward should
go to
them, because nobody asked for this Elolist and still they are testing
chess
programs, for the benefit of programmers, chess enthousiasts and people
who
are needing information for buying a chess program.
In the meantime, however, it is also necessary to ask ourselves whether
the
recent developments are in the interest of computerchess or not. Maybe
other
testmethods are necessary. I would like to ask all people involved to
think about
this idea. All suggestions are welcome.

50 testpositions to test playing strength of chess programs
===========================================

Position 1:  Ruy Lopez (1)
-------------------------------------
1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. 0-0 Be7 6. Re1 b5 7. Bb3 d6
8. c3 0-0 9. h3 Bb7

Position 2: Ruy Lopez (2)
------------------------------------
1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. 0-0 Nxe4 6. d4 b5 7. Bb3 d5
8. dxe5 Be6

Position 3: Ruy Lopez (3)
------------------------------------
1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Bxc6 dxc6 5. d4 exd4 6. Qxd4 Qxd4 7.
Nxd4

Position 4: Sicilian (1)
-------------------------------
1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Be2 e6 7. 0-0 Be7
8. f4 0-0 9. Be3 Qc7

Position 5: Sicilian (2)
-------------------------------
1. e4 c5 2. Nf3 Nc6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 d6 6. Bg5 e6 7. Qd2 a6
8. 0-0-0 h6 9. Be3 Be7 10. f4

Position 6: Sicilian (3)
-------------------------------
1. e4 c5 2. Nf3 e6 3. d4 cxd4 4. Nxd4 Nc6 5. Nc3 a6 6. Be2 Qc7

Position 7: Sicilian (4)
-------------------------------
1. e4 c5 2. c3 d5 3. exd5 Qxd5 4. d4 Nf6 5. Nf3 Bg4 6. Be2 e6 7. 0-0 Nc6
8. h3 Bh5 9. Be3 cxd4 10. cxd4 Be7 11. Nc3 Qd6

Position 8: French (1)
-------------------------------
1. e4 e6 2. d4 d5 3. Nc3 Bb4 4. e5 c5 5. a3 Bxc3 6. bxc3 Ne7 7. Nf3

Position 9: French (2)
-------------------------------
1. e4 e6 2. d4 d5 3. Nd2 c5 4. exd5 exd5 5. Ngf3

Position 10: French (3)
--------------------------------
1. e4 e6 2. d4 d5 3. e5 c5 4. c3 Nc6 5. Nf3 Qb6 6. a3 c4

Position 11: Caro-Kann (1)
--------------------------------------
1. e4 c6 2. d4 d5 3. Nc3 dxe4 4. Nxe4 Nd7 5. Ng5 Ngf6 6. Bd3 e6 7. N1f3
Bd6 8. Qe2 h6 9. Ne4 Nxe4 10. Qxe4 Nf6 11. Qe2

Position 12: Caro-Kann (2)
--------------------------------------
1. e4 c6 2. d4 d5 3. Nc3 dxe4 4. Nxe4 Bf5 5. Ng3 Bg6 6. h4 h6 7. h5 Bh7
8. Nf3 Nd7 9. Bd3 Bxd3 10. Qxd3 e6 11. Bd2 Qc7 12. 0-0-0 Ngf6 13. Ne4
0-0-0 14. g3

Position 13: Caro-Kann (3)
-------------------------------------
1. e4 c6 2. d4 d5 3. e5 Bf5 4. Nf3 e6 5. Be2

Position 14: Scotch
----------------------------
1. e4 e5 2. Nf3 Nc6 3. d4 exd4 4. Nxd4 Bc5 5. Be3 Qf6 6. c3 Nge7

Position 15: Italian game
-----------------------------------
1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. d3 Nf6 5. c3 d6 6. b4 Bb6 7. a4 a6

Position 16: Petroff
---------------------------
1. e4 e5 2. Nf3 Nf6 3. Nxe5 d6 4. Nf3 Nxe4 5. d4 d5 6. Bd3 Nc6 7. 0-0
Be7

Position 17: Four Knights
------------------------------------
1. e4 e5 2. Nf3 Nc6 3. Nc3 Nf6 4. Bb5 Bb4 5. 0-0 0-0 6. d3 d6 7. Bg5
Bxc3
8. bxc3

Position 18: Pirc-Ufimzew (1)
------------------------------------------
1. e4 d6 2. d4 Nf6 3. Nc3 g6 4. f4 Bg7 5. Nf3 0-0 6. Bd3

Position 19: Pirc-Ufimzew (2)
------------------------------------------
1. e4 d6 2. d4 Nf6 3. Nc3 g6 4. Nf3 Bg7 5. Be2 0-0 6. 0-0 Bg4

Position 20: Scandinavian
-------------------------------------
1. e4 d5 2. exd5 Qxd5 3. Nc3 Qa5 4. d4 Nf6 5. Nf3 c6 6. Bc4 Bf5

Position 21: Alekhine
------------------------------
1. e4 Nf6 2. e5 Nd5 3. d4 d6 4. Nf3 g6 5. Bc4 Nb6 6. Bb3 Bg7 7. a4 a5

Position 22: Modern Defence
------------------------------------------
1. e4 g6 2. d4 Bg7 3. Nc3 d6 4. f4 c6 5. Nf3 Bg4 6. Be3 Qb6 7. Qd2 Nd7

Position 23: Slav (1)
-----------------------------
1. d4 d5 2. c4 c6 3. Nf3 Nf6 4. Nc3 dxc4 5. a4 Bf5 6. e3 e6 7. Bxc4 Bb4
8. 0-0 Nbd7

Position 24: Slav (2)
-----------------------------
1. d4 d5 2. c4 c6 3. Nf3 Nf6 4. Nc3 dxc4 5. a4 Bf5 6. Ne5 e6 7. f3 Bb4
8. e4 Bxe4 9. fxe4 Nxe4 10. Bd2 Qxd4 11. Nxe4 Qxe4 12. Qe2 Bxd2+
13. Kxd2 Qd5+ 14. Kc2 Na6 15. Nxc4

Position 25: Slav (3)
-----------------------------
1. d4 d5 2. c4 c6 3. Nf3 Nf6 4. Nc3 e6 5. e3 Nbd7 6. Bd3 dxc4 7. Bxc4 b5
8. Bd3 Bb7 9. 0-0 a6 10. e4 c5

Position 26: Slav (4)
-----------------------------
1. d4 d5 2. c4 c6 3. Nf3 Nf6 4. Nc3 e6 5. e3 Nbd7 6. Qc2 Bd6 7. Bd3 0-0
8. 0-0

Position 27: QGD (1)
------------------------------
1. d4 d5 2. c4 e6 3. Nc3 Nf6 4. Bg5 Be7 5. e3 0-0 6. Nf3 h6 7. Bh4 b6

Position 28: QGD (2)
------------------------------
1. d4 d5 2. c4 e6 3. Nc3 Nf6 4. cxd5 exd5 5. Bg5 Be7 6. e3 c6 7. Qc2 0-0
8. Bd3 Nbd7 9. Nge2 Re8 10. 0-0 Nf8

Position 29: QGD Tarrasch
---------------------------------------
1. d4 d5 2. c4 e6 3. Nc3 c5 4. cxd5 exd5 5. Nf3 Nc6 6. g3 Nf6 7. Bg2 Be7
8. 0-0 0-0 9. Bg5 cxd4 10. Nxd4 h6 11. Be3 Re8

Position 30: QGD Semi-Tarrasch
-----------------------------------------------
1. d4 d5 2. c4 e6 3. Nc3 Nf6 4. Nf3 c5 5. cxd5 Nxd5 6. e4 Nxc3 7. bxc3
cxd4
8. cxd4 Bb4+ 9. Bd2 Bxd2+ 10. Qxd2

Position 31: QGA (1)
------------------------------
1. d4 d5 2. c4 dxc4 3. Nf3 Nf6 4. e3 e6 5. Bxc4 c5 6. 0-0 a6 7. a4 Nc6
8. Qe2 cxd4 9. Rd1 Be7 10. exd4 0-0 11. Nc3

Position 32: QGA (2)
------------------------------
1. d4 d5 2. c4 dxc4 3. e4 e5 4. Nf3 Bb4+ 5. Nc3 exd4 6. Qxd4 Qxd4 7.
Nxd4
Nf6 8. f3

Position 33: Nimzo-Indian (1)
-----------------------------------------
1. d4 Nf6 2. c4 e6 3. Nc3 Bb4 4. e3 0-0 5. Bd3 d5 6. Nf3 c5 7. 0-0 Nc6
8. a3 Bxc3 9. bxc3

Position 34: Nimzo-Indian (2)
------------------------------------------
1. d4 Nf6 2. c4 e6 3. Nc3 Bb4 4. Qc2 0-0 5. a3 Bxc3 6. Qxc3 b6 7. Bg5 h6
8. Bh4 Bb7

Position 35: Queen's Indian (1)
--------------------------------------------
1. d4 Nf6 2. c4 e6 3. Nf3 b6 4. g3 Ba6 5. b3 Bb4+ 6. Bd2 Be7 7. Bg2 d5
8. Ne5 Nfd7 9. Nxd7 Nxd7 10. Bc3 0-0 11. Nd2 Rc8 12. 0-0

Position 36: Queen's Indian (2)
--------------------------------------------
1. d4 Nf6 2. c4 e6 3. Nf3 b6 4. a3 Ba6 5. Qc2 Bb7 6. Nc3 c5 7. e4 cxd4
8. Nxd4

Position 37: King's Indian (1)
-----------------------------------------
1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 0-0 6. Be2 e5 7. 0-0 Nc6
8. d5 Ne7 9. Ne1 Nd7 10. Nd3 f5 11. Bd2 Nf6 12. f3 f4

Position 38: King's Indian (2)
-----------------------------------------
1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 0-0 6. Be2 e5 7. 0-0 Nc6
8. d5 Ne7 9. b4 Nh5 10. Re1

Position 39: King's Indian (3)
-----------------------------------------
1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. f3 0-0 6. Be3 e5 7. d5 Nh5
8. Qd2 f5

Position 40: Grunfeld Indian
----------------------------------------
1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. e4 Nxc3 6. bxc3 Bg7 7. Bc4
c5 8. Ne2 Nc6 9. Be3 0-0 10. 0-0

Position 41: Benoni
----------------------------
1. d4 Nf6 2. c4 c5 3. d5 e5 4. Nc3 d6 5. e4 Be7

Position 42: Benko gambit
--------------------------------------
1. d4 Nf6 2. c4 c5 3. d5 b5 4. cxb5 a6 5. bxa6 Bxa6 6. Nc3 g6 7. Nf3 Bg7
8. g3 d6 9. Bg2 0-0 10. 0-0 Nbd7

Position 43: English (1)
---------------------------------
1. c4 e5 2. Nc3 Nf6 3. Nf3 Nc6 4. g3 Bb4 5. Bg2 0-0 6. 0-0 e4 7. Ng5
Bxc3
8. bxc3 Re8

Position 44: English (2)
---------------------------------
1. c4 Nf6 2. Nf3 e6 3. Nc3 Bb4 4. Qc2 0-0 5. a3 Bxc3 6. Qxc3 c5 7. g3 b6
8. Bg2 Bb7

Position 45: English (3)
---------------------------------
1. c4 c5 2. Nc3 Nc6 3. Nf3 Nf6 4. g3 g6 5. Bg2 Bg7 6. 0-0 0-0 7. d4 cxd4
8. Nxd4 Nxd4 9. Qxd4 d6

Position 46: Reti
------------------------
1. Nf3 d5 2. c4 c6 3. g3 Nf6 4. b3 Bg4 5. Bg2 e6 6. Bb2 Nbd7 7. 0-0 Bd6
8. d3 0-0 9. Nbd2

Position 47: Bird's Defence
---------------------------------------
1. f4 d5 2. Nf3 Nf6 3. e3

Position 48: Dunst/Van Geet opening
------------------------------------------------------
1. Nc3

Position 49: Flank opening
--------------------------------------
1. g3 e5 2. Bg2 d5 3. d3

Position 50: Larsen's opening
------------------------------------------
1. b3 e5 2. Bb2 Nc6 3. e3









This page took 0.24 seconds to execute

Last modified: Thu, 07 Jul 11 08:48:38 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.