Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: How is improvements measured?

Author: Dann Corbit

Date: 11:52:57 04/27/00

On April 27, 2000 at 12:05:21, ujecrh wrote:
>On April 26, 2000 at 20:52:45, Dann Corbit wrote:
>>On April 26, 2000 at 20:41:31, Flemming Rodler wrote:
>>
>>>Hi all,
>>>
>>>I was just wondering how people determine if a new version of the program they
>>>are developing is better than the current. Are there other methods than just
>>>testing it against a whole bunch of other chess engines?
>>
>>That's the most important one, and (really) the only one you can trust [provided
>>that you include human opponents as well].
>>
>>They also test with EPD test suites.  Not reliable indicators of program play.
>>
>>Sometimes the programmer will have a pretty good idea that a change has helped
>>before testing.
>
>Improvement in one suite probably does not mean anything but if you test the
>program against a bunch of suites and it behaves better in almost all of them
>then I guess you can take it as a good sign that the engine is improved.

A good sign, yes, but a very artificial one.  And quite possibly giving a bad
answer.

It is easy to alter engine code so that it scores well on test suites but plays
terribly.  Look at Ed Schroder's "personality" tests and you will see that the
settings that do well on EPD test suites are not the same settings that play
well in games.

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.