Computer Chess Club Archives


Search

Terms

Messages

Subject: Tuning and testing chess programs (was:SSDF and Junior6a and Fritz6a)

Author: Heiko Mikala

Date: 14:25:42 01/19/00

Go up one level in this thread



Hi Christoph!

On January 19, 2000 at 12:51:17, Christophe Theron wrote:


Sorry Christoph, it was not my intention to attack or upset you.

I only wrote this, because Vincent's post implied, that Fritz 6a is a version of
Fritz specifically tuned to perform better against Tiger.


>Why do you believe so strongly that programmers "tune" their engines against
>each other?
>
>I think it would take more time to "tune" against a program than to simply
>improve the engine, and anyway it would certainly weaken the program overall.
>
>For your information, I have the Fritz6 CD at home. Frederic Friedel has been
>kind enough to send it to me. Unfortunately I have never find the time to try
>this program. It is still inside its sealed package, I don't even know what the
>GUI looks like.
>
>I'll certainly install it later, as soon as I have some time, because I'm
>curious about it, of course. And I'll certainly download the engine update.
>
>I'll also play some manual games against Tiger in order to see if it can help me
>to find weaknesses in my own engine, but I'll never use it to play thousands of
>automatic games or to "tune" against it. That's not how I work.
>
>
>The Rebel-Tiger engine has been frozen at the end of November, and the opening
>book is frozen since August.
>
>Forget about this "tuning" stuff, it's not the way we improve our chess
>programs.
>
>I don't do that, and I don't think Frans or Amir would do that either. I think
>they are experienced and wise enough to avoid this shortsighted way of working.


As you may or may not know, I've been working on my own chess program for more
than 10 years.

For many, many years my own way of testing improvements in my program was mainly
to use large sets of test positions and only sometimes play a game or two
against another program. The reason for playing these games against other
programs was, that I'm by far a too weak chess player to be a real test for my
own program. The reason I only played very few of these test games was, that
there was no way to let the programs play automatically. So I had my set of well
chosen test positions, of which I knew, where my program had problems and why it
had these problems. I first used a few of these positions to test the changes,
then, when I thought the changes might be ok, I ran a large test set over night
to see the impact of the changes on other positions. I than collected all the
data to be able to compare different versions. Then I played a few test games
(yes, I played myself too, but more important were the games against other
programs like Gnu Chess, Fritz 1/2, Chessmaster 3000 and so on). I watched these
games very well, and I had to anyway, because I had to play them manually. I
always found it amazing, how much you can find out about your own program by
seeing it play against other strong opponents.

One important point that I learned in all this is, that a version that does
exceptionally well in tactical test suites may be a desaster in real play. That
doesn't mean of course, that all versions beeing good in solving test positions
are bad in real play, but test positions alone are not enough in my opinion.

All this changed dramatically, when Winboard became popular, and I made my
engine Winboard-compatible. Suddenly I had a perfect way to let my program
automatically play test matches against other programs. You may believe it or
not, but this helped me a lot. In a very short time, I was able to find new
weaknesses in my program, that I wasn't aware of before. Simply because I found
out, that in a series of 40-60 games my program lost a whole lot of games
because of the same fault.

For some time I only used these test matches to test changes in my program, but
in the meantime I have changed my mind again, and think, that a combination of
test matches and sets of test positions must be the best way of testing.

Concerning the test matches and tuning against other programs, I too don't
think, that it really pays off to tune a program against one single other
opponent, because the program most definitely will be weaker against other
opponents then. I tried this, and at one point I had a version that won each
test match against program A (no names here) and lost each match against program
B. With only a simple change I could make my program win the matches against
program B, but than it lost it's matches against opponent A. It was even a bit
more complicated, because 4 or 5 different opponents were involved.

So, to make it short, I generally aggree with you, that it doesn't make much
sense to tune against a single opponent, but I do think, that playing long test
matches against other programs helps in improving the strength of a chess
program.

And I think I know of at least one top programmer, whome I think you know very
well too, who plays large series of test games and sometimes even publishes the
results. And I think that Chessbase too collects data of test matches produced
by it's beta testers. Really, I think that most chess programmers use results of
games of their programs to improve the engines.

How do you test your program? Only test positions? Or only the games played by
members of your chess club against Tiger, which you were talking about? Why
don't you think, that long matches of Tiger against other top programs might
help you to find weaknesses in Tiger?


Again, sorry if I upset you Christoph, that was not my intention!

Greetings,

Heiko.



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.