Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Tuning and testing chess programs (was:SSDF and Junior6a and Fritz6a)

Author: Christophe Theron
Date: 18:00:02 01/19/00
On January 19, 2000 at 17:25:42, Heiko Mikala wrote:

>
>Hi Christoph!
>
>On January 19, 2000 at 12:51:17, Christophe Theron wrote:
>
>
>Sorry Christoph, it was not my intention to attack or upset you.
>
>I only wrote this, because Vincent's post implied, that Fritz 6a is a version of
>Fritz specifically tuned to perform better against Tiger.


I'm not upset, but I'm fighting against some legends since a long time. "Tuning"
is one of them.

I think I'd better stop fighting against the windmills and let people believe in
things like:

* "tuning"
* fast searcher => bad in positional play
* slow searcher => a lot of knowledge
* a 10 game match is significant
* 1h/game and 2h/40moves are totally different things

Looks like most people really LOVE to believe in these rubbishs.

Not you specifically, but your comment about tuning could have been interpreted
as you believed in "tuning".



>>Why do you believe so strongly that programmers "tune" their engines against
>>each other?
>>
>>I think it would take more time to "tune" against a program than to simply
>>improve the engine, and anyway it would certainly weaken the program overall.
>>
>>For your information, I have the Fritz6 CD at home. Frederic Friedel has been
>>kind enough to send it to me. Unfortunately I have never find the time to try
>>this program. It is still inside its sealed package, I don't even know what the
>>GUI looks like.
>>
>>I'll certainly install it later, as soon as I have some time, because I'm
>>curious about it, of course. And I'll certainly download the engine update.
>>
>>I'll also play some manual games against Tiger in order to see if it can help me
>>to find weaknesses in my own engine, but I'll never use it to play thousands of
>>automatic games or to "tune" against it. That's not how I work.
>>
>>
>>The Rebel-Tiger engine has been frozen at the end of November, and the opening
>>book is frozen since August.
>>
>>Forget about this "tuning" stuff, it's not the way we improve our chess
>>programs.
>>
>>I don't do that, and I don't think Frans or Amir would do that either. I think
>>they are experienced and wise enough to avoid this shortsighted way of working.
>
>
>As you may or may not know, I've been working on my own chess program for more
>than 10 years.
>
>For many, many years my own way of testing improvements in my program was mainly
>to use large sets of test positions and only sometimes play a game or two
>against another program. The reason for playing these games against other
>programs was, that I'm by far a too weak chess player to be a real test for my
>own program. The reason I only played very few of these test games was, that
>there was no way to let the programs play automatically. So I had my set of well
>chosen test positions, of which I knew, where my program had problems and why it
>had these problems. I first used a few of these positions to test the changes,
>then, when I thought the changes might be ok, I ran a large test set over night
>to see the impact of the changes on other positions. I than collected all the
>data to be able to compare different versions. Then I played a few test games
>(yes, I played myself too, but more important were the games against other
>programs like Gnu Chess, Fritz 1/2, Chessmaster 3000 and so on). I watched these
>games very well, and I had to anyway, because I had to play them manually. I
>always found it amazing, how much you can find out about your own program by
>seeing it play against other strong opponents.


Nothing wrong in your methodology. Sounds reasonnable.



>One important point that I learned in all this is, that a version that does
>exceptionally well in tactical test suites may be a desaster in real play. That
>doesn't mean of course, that all versions beeing good in solving test positions
>are bad in real play, but test positions alone are not enough in my opinion.


Definitely right.



>All this changed dramatically, when Winboard became popular, and I made my
>engine Winboard-compatible. Suddenly I had a perfect way to let my program
>automatically play test matches against other programs. You may believe it or
>not, but this helped me a lot. In a very short time, I was able to find new
>weaknesses in my program, that I wasn't aware of before. Simply because I found
>out, that in a series of 40-60 games my program lost a whole lot of games
>because of the same fault.


The important thing is to find a way to spot the weaknesses in your program. The
opponents can be human players or computers, that's equally OK.

So you can use human players or computers, and in both cases you'll improve.

I don't call this "tuning".



>For some time I only used these test matches to test changes in my program, but
>in the meantime I have changed my mind again, and think, that a combination of
>test matches and sets of test positions must be the best way of testing.
>
>Concerning the test matches and tuning against other programs, I too don't
>think, that it really pays off to tune a program against one single other
>opponent, because the program most definitely will be weaker against other
>opponents then. I tried this, and at one point I had a version that won each
>test match against program A (no names here) and lost each match against program
>B. With only a simple change I could make my program win the matches against
>program B, but than it lost it's matches against opponent A. It was even a bit
>more complicated, because 4 or 5 different opponents were involved.


Absolutely correct. This is tuning. Tuning is im my opinion tweaking your
engine's evaluation parameters in order to get a best result against a given
opponent. Tuning is not improving, because in fact you don't add new knowledge
in your program. And you focus on one opponent, which is likely to backfire
because you are likely to weaken your program against other opponents.

If "tuning" makes your program better against various opponents, then it's
simply because your program initially had rather bad evaluation settings. In
this case I would not call it "tuning".

So things are more complicated than one can expect.



>So, to make it short, I generally aggree with you, that it doesn't make much
>sense to tune against a single opponent, but I do think, that playing long test
>matches against other programs helps in improving the strength of a chess
>program.


This helps you find weaknesses in your program, and I see nothing wrong with
this.

You can as well let your program play on a chess server. It is the same. In the
end you find weaknesses and you work to fix them.



>And I think I know of at least one top programmer, whome I think you know very
>well too, who plays large series of test games and sometimes even publishes the
>results. And I think that Chessbase too collects data of test matches produced
>by it's beta testers. Really, I think that most chess programmers use results of
>games of their programs to improve the engines.


This is not tuning. This is "checking". When you have worked for several weeks
to improve many things in your engine, you let it play against a bunch of
various opponents to check that, at least, your program does not perform worse
than before.

This is an elementary sanity check I would say. And the test takes a long time.

If you want to tune, you have to play against only one program and you probably
want to get a lot of games in a short time. And you repeat the test often,
ideally each time you change any evaluation parameter value. This is different.



>How do you test your program? Only test positions? Or only the games played by
>members of your chess club against Tiger, which you were talking about?


My testing procedure is complex and I prefer to keep it secret.



> Why
>don't you think, that long matches of Tiger against other top programs might
>help you to find weaknesses in Tiger?


I do think matches against program can help me to find weaknesses in Tiger.
Exactly like games against humans do.



>Again, sorry if I upset you Christoph, that was not my intention!


No problem. Nothing against you! :)



    Christophe
Re: Tuning and testing chess programs (was:SSDF and Junior6a and Fritz6a) Chessfun 22:24:28 01/19/00
- Re: Tuning and testing chess programs (was:SSDF and Junior6a and Fritz6a) Christophe Theron 23:22:28 01/19/00
Re: Tuning and testing chess programs (was:SSDF and Junior6a and Fritz6a) Albert Silver 20:06:59 01/19/00
- Re: Tuning and testing chess programs (was:SSDF and Junior6a and Fritz6a) Chessfun 20:25:10 01/19/00
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.