Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Test suites

Author: chandler yergin

Date: 15:29:10 02/01/06

Go up one level in this thread


On February 01, 2006 at 17:15:11, Dann Corbit wrote:

>On February 01, 2006 at 16:44:07, chandler yergin wrote:
>
>>On February 01, 2006 at 16:16:25, Dann Corbit wrote:
>>
>>>On February 01, 2006 at 16:03:41, Uri Blass wrote:
>>>
>>>>On February 01, 2006 at 14:48:50, Dann Corbit wrote:
>>>>
>>>>>On February 01, 2006 at 13:27:44, Uri Blass wrote:
>>>>>
>>>>>>On February 01, 2006 at 13:08:44, Dann Corbit wrote:
>>>>>>
>>>>>>>On February 01, 2006 at 12:31:55, Uri Blass wrote:
>>>>>>>
>>>>>>>>On February 01, 2006 at 12:04:47, Dann Corbit wrote:
>>>>>>>>
>>>>>>>>>On February 01, 2006 at 11:14:36, David B Weller wrote:
>>>>>>>>>
>>>>>>>>>>I was just here trying to figure out why my engine doesnt get a certain bm for a
>>>>>>>>>>positional test, and it occurred to me ...
>>>>>>>>>>
>>>>>>>>>>Why would I trust that?
>>>>>>>>>>
>>>>>>>>>>Many of the basic terms, eg., isolated pawn, have a fairly well established
>>>>>>>>>>value, representing a statisitical average over many, many positions
>>>>>>>>>>
>>>>>>>>>>If my engine,is missing some positonal move, for no other reason than I can
>>>>>>>>>>tell, except perhaps my isolated = 20 should be isolated = 25, then I am
>>>>>>>>>>disregarding the trillions of other positions where it is, statistically
>>>>>>>>>>speaking, really 20
>>>>>>>>>>
>>>>>>>>>>As it has been pointed out many times, these tests suites are good only for
>>>>>>>>>>detecting gross errors
>>>>>>>>>>
>>>>>>>>>>So if you plan on tweaking the value of your SE metrics by test suites, make
>>>>>>>>>>sure it has about a million positions ;-)
>>>>>>>>>>
>>>>>>>>>>Maybe this is why 'auto' tuning is hard. Because if the suite doesnt contain
>>>>>>>>>>enough data to be representative of all the features one is trying to tune, it
>>>>>>>>>>will just be a waste of time, and make it worse...
>>>>>>>>>>
>>>>>>>>>>It could be that many problems can be easily solved, simply by inflating or
>>>>>>>>>>deflating the right term(s). And certainly a 'genetic' algorithm would find the
>>>>>>>>>>right ones to inflate/deflate on a small set of positions in order to get more
>>>>>>>>>>of them right...
>>>>>>>>>>
>>>>>>>>>>Fact is, it could be the very reason the position got in the test suite, is
>>>>>>>>>>because its is a little 'freakish'. Then what? We're tuning our engines to
>>>>>>>>>>become worse!
>>>>>>>>>>
>>>>>>>>>>my $0.02
>>>>>>>>>>
>>>>>>>>>>IMHO
>>>>>>>>>>
>>>>>>>>>>-David
>>>>>>>>>
>>>>>>>>>And yet the really good engines tend to solve all of them, or nearly all of
>>>>>>>>>them.
>>>>>>>>
>>>>>>>>You are talking about tactical suites when david was talking about positional
>>>>>>>>suites.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>Of course, an equal problem to test suites is that all of them are full of
>>>>>>>>>outright mistakes and errors.
>>>>>>>>>
>>>>>>>>>Probably the best debugged suite is WAC and yet I imagine that it still contains
>>>>>>>>>errors.
>>>>>>>>
>>>>>>>>I doubt if it is the best debugged suite.
>>>>>>>
>>>>>>>I am very sure of it.  Every position has been analyzed by multiple strong
>>>>>>>engines for long time control.  No other suite has the same effort applied to it
>>>>>>>as far as I know.
>>>>>>
>>>>>>I am surprised to read it because
>>>>>>I think that programmers usually use WAC only at fast time control when they use
>>>>>>other test suites at longer time control so common sense tells me that other
>>>>>>test suites were probably tested more at long time control.
>>>>>>
>>>>>>I remember that I reported about some alternative solutions in arasan that were
>>>>>>corrected.
>>>>>>
>>>>>>I also reported about some cases when there are additional solutions in ecmgcp.
>>>>>>
>>>>>>Note that if cooks mean more than one winning moves then I am also sure that
>>>>>>there are many cooks in WAC.
>>>>>>
>>>>>>There are winning moves that it is clear that no good program is going to play
>>>>>>and my opinion is that position can be considered as position with no errors
>>>>>>even if it has more than one winning move as long as we can practically expect
>>>>>>all programs to find the same move.
>>>>>
>>>>>Clearly we cannot expect it.  If every program made the same move as the others
>>>>>there would be no need even to play them against each other.  And if one program
>>>>>finds a different (and potentially even better) solution to a problem and yet is
>>>>>scored as having failed the position, then clearly it is the position that is
>>>>>broken and not the program.
>>>>
>>>>I think that there is better solution of 2 winning moves.
>>>>
>>>>If one move give mate in 2 and one move wins the queen then for me winning the
>>>>queen is wrong solution for practical purposes even if I am sure that it wins
>>>>the game because I expect strong programs not to find it.
>>>
>>>If there is a mate in 1 and a mate in 12000, then they are both solutions.
>>>If one solution is a mate and the other is not, then the other [non-mate] may or
>>>may not be a solution.
>>>
>>>>If you will check the WAC test by this way you may find that many solutions of
>>>>it are not correct because the side that has mate in 2 can get rook advantage
>>>>and win the game more slowly.
>>>
>>>If they are certain wins, then they are also solutions.  The object of the game
>>>is to win if you can win, else draw if you can draw.
>>>
>>>>>Every winning move (for a won position) is one of the solutions and if the
>>>>>solutions are missing then the solution should be corrected.
>>>>
>>>>I will not be surprised if by your definitions most of the WAC positions should
>>>>be corrected.
>>>
>>>They should be corrected if proven.
>>By whom? The book is written.
>
>I am talking about the test suite and not about the book.  Books can contain
>mistakes.
>
>>>For instance, if one move is a mate and the other is not proven to be a mate,
>>>then it is not a correction yet.
>>Who is going to Print the correction?
>
>It does not matter if the book is corrected or not.  The test suite can be
>corrected.  If someone wants to write it up, then that is a bonus.
>
>>  But if it can be absolutely proven to win,
>>>then it is an alternative solution.
>>Why should an alternate solution be given any credbility?
>
>If it is proven to win then it is credible.
>
>>The "best" move leading to Mate IS the solution.
>
>If there are multiple moves leading to mate then there are multiple solutions.

Yep, I guess that's called a "Cook" at least in Problem Compositions..
;)
>It is also possible for moves to be published as best moves that actually lead
>to a direct loss.
Indded!

 I know of at least one of these (not in WAC though).
I know there are many so-called "Test Positions" that are completely wrong.
I have Posted several.

  There
>are some WAC positions that are questionable as to whether they win or not.
Of course! I have Posted some of these too.
But, I'm not going to dig through Archives..
Just Post some of the EPD's I have listed; and do the Analysis yourself.
OK?
Thanks,
Chan



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.