Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Different test suite method?

Author: Dann Corbit
Date: 16:09:52 06/15/04
On June 15, 2004 at 18:42:15, David Dahlem wrote:

>On June 15, 2004 at 17:57:56, Dann Corbit wrote:
>
>>On June 15, 2004 at 17:45:49, David Dahlem wrote:
>>
>>>On June 15, 2004 at 17:36:02, Dann Corbit wrote:
>>>
>>>>On June 15, 2004 at 17:29:39, David Dahlem wrote:
>>>>
>>>>>On June 15, 2004 at 17:16:14, Dann Corbit wrote:
>>>>>
>>>>>>On June 15, 2004 at 17:05:57, David Dahlem wrote:
>>>>>>
>>>>>>>On June 15, 2004 at 16:44:58, Dann Corbit wrote:
>>>>>>>
>>>>>>>>On June 15, 2004 at 16:00:08, David Dahlem wrote:
>>>>>>>>
>>>>>>>>>On June 15, 2004 at 15:54:23, Gian-Carlo Pascutto wrote:
>>>>>>>>>
>>>>>>>>>>On June 15, 2004 at 15:33:41, David Dahlem wrote:
>>>>>>>>>>
>>>>>>>>>>>One of the problems with the current method of testing engines with test suites
>>>>>>>>>>>(e.g. WM-Test) is the problem of proving that the proposed solution move is
>>>>>>>>>>>actually the best move, especially with positions of a positional nature.
>>>>>>>>>>>Perhaps a new method would avoid this problem, namely a suite of mate positions,
>>>>>>>>>>>with known, more easily proven solutions? Time to solution could be the criteria
>>>>>>>>>>>by which engines are evaluated.
>>>>>>>>>>>
>>>>>>>>>>>Just an idea. Any thoughts? Would this work?
>>>>>>>>>>
>>>>>>>>>>As long as the idea is to test matefinder speeds this is fine.
>>>>>>>>>>
>>>>>>>>>>Don't expect to get an indication to playing strength though.
>>>>>>>>>>
>>>>>>>>>>--
>>>>>>>>>>GCP
>>>>>>>>>
>>>>>>>>>Well, this was just an idea, an unproven theory, but i would think some kind of
>>>>>>>>>formula could be developed, and i would also think stronger engines would score
>>>>>>>>>higher than weaker engines. :-)
>>>>>>>>
>>>>>>>>Probably they would.  But what is the relationship?
>>>>>>>>
>>>>>>>>For instance, if I ride ten miles on my bike at 20 MPH, and I jog 5 miles down a
>>>>>>>>trail at 10 MPH, what is the conversion for benefit between the two forms of
>>>>>>>>exercise?
>>>>>>>
>>>>>>>Well, that's apples and oranges. More valid would be to time you on your bike to
>>>>>>>the finish line against someone elses time to the finish line. :-)
>>>>>>
>>>>>>That's my point.  Both comparisons are apples to oranges.
>>>>>
>>>>>Comparison of elapsed time to the finish line over a certain distance between
>>>>>two competitors is like comparing apples and oranges? Then all horse races,
>>>>>vehicle races, etc. are meaningless?
>>>>
>>>>I take a horse and run him without a rider.  Now, I am going to use this to
>>>>predict how he will run with a rider.  Maybe there is a direct correlation, and
>>>>maybe there isn't.  And if there is a direct correlation, what is it?
>>>>
>>>>A test suite does not predict how well an engine will play.  If it did, then
>>>>Beowulf would beat Shredder 6, because Beowulf scored 288/300 on WAC at 5
>>>>seconds, and Shredder 6 scored 285 (on a certain machine).  Of course, Shredder
>>>>would pound the ever-loving stuffings out of Beowulf in actual game play.
>>>
>>>I agree totally, that's what got me thinking about test suites, and the reason i
>>>started this thread, hoping to start a dialog on better testing methods. Using
>>>mate problems may not be accurate enough either, but it seems to me that's a
>>>better method than using positions where the "best move" proposed is not always
>>>proven to be best.
>>
>>I don't think it is better.  During 90% or more of the moves in a chess game,
>>you will not be seeking a checkmate.  You are striving to improve your position.
>> You are striving to win material.  You are striving to put the enemy king into
>>peril (not necessarily a checkmate).  You are striving to improve your pawn
>>formation.  You are striving to produce a passed pawn.  Checkmate tests do not
>>help in these areas, except by chance.
>>
>>Imagine the opening board.  How will searching for the best checkmate in this
>>position make the program better?  Clearly it won't have any bearing on program
>>strength.
>
>Yes, you are obviously quite right in all these points. By the way, what is the
>best checkmate from the opening position? :-)

I also appreciate your interest in reopening the topic.

You bring up an interesting point, which is that the proposed solution to a test
problem may not be the best solution.  Further (also correct) is that the only
way to know if a solution is optimal is to PROVE it with a program like Chest.
So, we are left with "dirty" test suites that clearly have bugs in them.

Where to go from here?
One thing that can be done is to debug the test suites, as if they were programs
or engineering designs.  Of course, until we forge ahead to mate (or forced loss
or forced draw), the solution is in question.  So we should think about how much
uncertainty we can live with.

One possible work-around is to analyze test problems to a very high depth or a
very long time span (say depth = 16 plies or time = 1 hour) for problems that
should be resolved in a fairly short time (e.g. 10 minutes or less).  If we run
a dozen of the strongest programs against the problem sets at a very long time
span or a very deep depth (or both) then  we can be reasonably sure that there
are no "obvious" answers at shallower depths.  And if one does turn up, then we
can alter the test set.  Many problems in WAC were resolved by a method similar
to this.

I am sure that there are other ways to improve the testing as well.
Re: Different test suite method? J. Wesley Cleveland 09:57:11 06/16/04
- Re: Different test suite method? Dann Corbit 11:19:53 06/16/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.