Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Test suites

Author: chandler yergin

Date: 13:44:07 02/01/06

Go up one level in this thread


On February 01, 2006 at 16:16:25, Dann Corbit wrote:

>On February 01, 2006 at 16:03:41, Uri Blass wrote:
>
>>On February 01, 2006 at 14:48:50, Dann Corbit wrote:
>>
>>>On February 01, 2006 at 13:27:44, Uri Blass wrote:
>>>
>>>>On February 01, 2006 at 13:08:44, Dann Corbit wrote:
>>>>
>>>>>On February 01, 2006 at 12:31:55, Uri Blass wrote:
>>>>>
>>>>>>On February 01, 2006 at 12:04:47, Dann Corbit wrote:
>>>>>>
>>>>>>>On February 01, 2006 at 11:14:36, David B Weller wrote:
>>>>>>>
>>>>>>>>I was just here trying to figure out why my engine doesnt get a certain bm for a
>>>>>>>>positional test, and it occurred to me ...
>>>>>>>>
>>>>>>>>Why would I trust that?
>>>>>>>>
>>>>>>>>Many of the basic terms, eg., isolated pawn, have a fairly well established
>>>>>>>>value, representing a statisitical average over many, many positions
>>>>>>>>
>>>>>>>>If my engine,is missing some positonal move, for no other reason than I can
>>>>>>>>tell, except perhaps my isolated = 20 should be isolated = 25, then I am
>>>>>>>>disregarding the trillions of other positions where it is, statistically
>>>>>>>>speaking, really 20
>>>>>>>>
>>>>>>>>As it has been pointed out many times, these tests suites are good only for
>>>>>>>>detecting gross errors
>>>>>>>>
>>>>>>>>So if you plan on tweaking the value of your SE metrics by test suites, make
>>>>>>>>sure it has about a million positions ;-)
>>>>>>>>
>>>>>>>>Maybe this is why 'auto' tuning is hard. Because if the suite doesnt contain
>>>>>>>>enough data to be representative of all the features one is trying to tune, it
>>>>>>>>will just be a waste of time, and make it worse...
>>>>>>>>
>>>>>>>>It could be that many problems can be easily solved, simply by inflating or
>>>>>>>>deflating the right term(s). And certainly a 'genetic' algorithm would find the
>>>>>>>>right ones to inflate/deflate on a small set of positions in order to get more
>>>>>>>>of them right...
>>>>>>>>
>>>>>>>>Fact is, it could be the very reason the position got in the test suite, is
>>>>>>>>because its is a little 'freakish'. Then what? We're tuning our engines to
>>>>>>>>become worse!
>>>>>>>>
>>>>>>>>my $0.02
>>>>>>>>
>>>>>>>>IMHO
>>>>>>>>
>>>>>>>>-David
>>>>>>>
>>>>>>>And yet the really good engines tend to solve all of them, or nearly all of
>>>>>>>them.
>>>>>>
>>>>>>You are talking about tactical suites when david was talking about positional
>>>>>>suites.
>>>>>>
>>>>>>>
>>>>>>>Of course, an equal problem to test suites is that all of them are full of
>>>>>>>outright mistakes and errors.
>>>>>>>
>>>>>>>Probably the best debugged suite is WAC and yet I imagine that it still contains
>>>>>>>errors.
>>>>>>
>>>>>>I doubt if it is the best debugged suite.
>>>>>
>>>>>I am very sure of it.  Every position has been analyzed by multiple strong
>>>>>engines for long time control.  No other suite has the same effort applied to it
>>>>>as far as I know.
>>>>
>>>>I am surprised to read it because
>>>>I think that programmers usually use WAC only at fast time control when they use
>>>>other test suites at longer time control so common sense tells me that other
>>>>test suites were probably tested more at long time control.
>>>>
>>>>I remember that I reported about some alternative solutions in arasan that were
>>>>corrected.
>>>>
>>>>I also reported about some cases when there are additional solutions in ecmgcp.
>>>>
>>>>Note that if cooks mean more than one winning moves then I am also sure that
>>>>there are many cooks in WAC.
>>>>
>>>>There are winning moves that it is clear that no good program is going to play
>>>>and my opinion is that position can be considered as position with no errors
>>>>even if it has more than one winning move as long as we can practically expect
>>>>all programs to find the same move.
>>>
>>>Clearly we cannot expect it.  If every program made the same move as the others
>>>there would be no need even to play them against each other.  And if one program
>>>finds a different (and potentially even better) solution to a problem and yet is
>>>scored as having failed the position, then clearly it is the position that is
>>>broken and not the program.
>>
>>I think that there is better solution of 2 winning moves.
>>
>>If one move give mate in 2 and one move wins the queen then for me winning the
>>queen is wrong solution for practical purposes even if I am sure that it wins
>>the game because I expect strong programs not to find it.
>
>If there is a mate in 1 and a mate in 12000, then they are both solutions.
>If one solution is a mate and the other is not, then the other [non-mate] may or
>may not be a solution.
>
>>If you will check the WAC test by this way you may find that many solutions of
>>it are not correct because the side that has mate in 2 can get rook advantage
>>and win the game more slowly.
>
>If they are certain wins, then they are also solutions.  The object of the game
>is to win if you can win, else draw if you can draw.
>
>>>Every winning move (for a won position) is one of the solutions and if the
>>>solutions are missing then the solution should be corrected.
>>
>>I will not be surprised if by your definitions most of the WAC positions should
>>be corrected.
>
>They should be corrected if proven.
By whom? The book is written.
>For instance, if one move is a mate and the other is not proven to be a mate,
>then it is not a correction yet.
Who is going to Print the correction?

  But if it can be absolutely proven to win,
>then it is an alternative solution.
Why should an alternate solution be given any credbility?
The "best" move leading to Mate IS the solution.
cy



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.