Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Test suites

Author: Dann Corbit

Date: 11:48:50 02/01/06

On February 01, 2006 at 13:27:44, Uri Blass wrote:

>On February 01, 2006 at 13:08:44, Dann Corbit wrote:
>
>>On February 01, 2006 at 12:31:55, Uri Blass wrote:
>>
>>>On February 01, 2006 at 12:04:47, Dann Corbit wrote:
>>>
>>>>On February 01, 2006 at 11:14:36, David B Weller wrote:
>>>>
>>>>>I was just here trying to figure out why my engine doesnt get a certain bm for a
>>>>>positional test, and it occurred to me ...
>>>>>
>>>>>Why would I trust that?
>>>>>
>>>>>Many of the basic terms, eg., isolated pawn, have a fairly well established
>>>>>value, representing a statisitical average over many, many positions
>>>>>
>>>>>If my engine,is missing some positonal move, for no other reason than I can
>>>>>tell, except perhaps my isolated = 20 should be isolated = 25, then I am
>>>>>disregarding the trillions of other positions where it is, statistically
>>>>>speaking, really 20
>>>>>
>>>>>As it has been pointed out many times, these tests suites are good only for
>>>>>detecting gross errors
>>>>>
>>>>>So if you plan on tweaking the value of your SE metrics by test suites, make
>>>>>sure it has about a million positions ;-)
>>>>>
>>>>>Maybe this is why 'auto' tuning is hard. Because if the suite doesnt contain
>>>>>enough data to be representative of all the features one is trying to tune, it
>>>>>will just be a waste of time, and make it worse...
>>>>>
>>>>>It could be that many problems can be easily solved, simply by inflating or
>>>>>deflating the right term(s). And certainly a 'genetic' algorithm would find the
>>>>>right ones to inflate/deflate on a small set of positions in order to get more
>>>>>of them right...
>>>>>
>>>>>Fact is, it could be the very reason the position got in the test suite, is
>>>>>because its is a little 'freakish'. Then what? We're tuning our engines to
>>>>>become worse!
>>>>>
>>>>>my $0.02
>>>>>
>>>>>IMHO
>>>>>
>>>>>-David
>>>>
>>>>And yet the really good engines tend to solve all of them, or nearly all of
>>>>them.
>>>
>>>You are talking about tactical suites when david was talking about positional
>>>suites.
>>>
>>>>
>>>>Of course, an equal problem to test suites is that all of them are full of
>>>>outright mistakes and errors.
>>>>
>>>>Probably the best debugged suite is WAC and yet I imagine that it still contains
>>>>errors.
>>>
>>>I doubt if it is the best debugged suite.
>>
>>I am very sure of it.  Every position has been analyzed by multiple strong
>>engines for long time control.  No other suite has the same effort applied to it
>>as far as I know.
>
>I am surprised to read it because
>I think that programmers usually use WAC only at fast time control when they use
>other test suites at longer time control so common sense tells me that other
>test suites were probably tested more at long time control.
>
>I remember that I reported about some alternative solutions in arasan that were
>corrected.
>
>I also reported about some cases when there are additional solutions in ecmgcp.
>
>Note that if cooks mean more than one winning moves then I am also sure that
>there are many cooks in WAC.
>
>There are winning moves that it is clear that no good program is going to play
>and my opinion is that position can be considered as position with no errors
>even if it has more than one winning move as long as we can practically expect
>all programs to find the same move.

Clearly we cannot expect it.  If every program made the same move as the others
there would be no need even to play them against each other.  And if one program
finds a different (and potentially even better) solution to a problem and yet is
scored as having failed the position, then clearly it is the position that is
broken and not the program.

Every winning move (for a won position) is one of the solutions and if the
solutions are missing then the solution should be corrected.

Every drawing move (for a drawn position) is the same.

There may also be positions that are dead lost.  For these positions, there are
no solutions and they should be removed from any test suite that contains them
(unless there is some dominating reason to keep them).

Re: Test suites Uri Blass 13:03:41 02/01/06
- Re: Test suites Dann Corbit 13:16:25 02/01/06
  - Re: Test suites chandler yergin 13:44:07 02/01/06
    - Re: Test suites Dann Corbit 14:15:11 02/01/06
      - Re: Test suites chandler yergin 15:29:10 02/01/06

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.