Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Test suites

Author: Christopher Conkie
Date: 10:30:20 02/01/06
On February 01, 2006 at 13:08:44, Dann Corbit wrote:

>On February 01, 2006 at 12:31:55, Uri Blass wrote:
>
>>On February 01, 2006 at 12:04:47, Dann Corbit wrote:
>>
>>>On February 01, 2006 at 11:14:36, David B Weller wrote:
>>>
>>>>I was just here trying to figure out why my engine doesnt get a certain bm for a
>>>>positional test, and it occurred to me ...
>>>>
>>>>Why would I trust that?
>>>>
>>>>Many of the basic terms, eg., isolated pawn, have a fairly well established
>>>>value, representing a statisitical average over many, many positions
>>>>
>>>>If my engine,is missing some positonal move, for no other reason than I can
>>>>tell, except perhaps my isolated = 20 should be isolated = 25, then I am
>>>>disregarding the trillions of other positions where it is, statistically
>>>>speaking, really 20
>>>>
>>>>As it has been pointed out many times, these tests suites are good only for
>>>>detecting gross errors
>>>>
>>>>So if you plan on tweaking the value of your SE metrics by test suites, make
>>>>sure it has about a million positions ;-)
>>>>
>>>>Maybe this is why 'auto' tuning is hard. Because if the suite doesnt contain
>>>>enough data to be representative of all the features one is trying to tune, it
>>>>will just be a waste of time, and make it worse...
>>>>
>>>>It could be that many problems can be easily solved, simply by inflating or
>>>>deflating the right term(s). And certainly a 'genetic' algorithm would find the
>>>>right ones to inflate/deflate on a small set of positions in order to get more
>>>>of them right...
>>>>
>>>>Fact is, it could be the very reason the position got in the test suite, is
>>>>because its is a little 'freakish'. Then what? We're tuning our engines to
>>>>become worse!
>>>>
>>>>my $0.02
>>>>
>>>>IMHO
>>>>
>>>>-David
>>>
>>>And yet the really good engines tend to solve all of them, or nearly all of
>>>them.
>>
>>You are talking about tactical suites when david was talking about positional
>>suites.
>>
>>>
>>>Of course, an equal problem to test suites is that all of them are full of
>>>outright mistakes and errors.
>>>
>>>Probably the best debugged suite is WAC and yet I imagine that it still contains
>>>errors.
>>
>>I doubt if it is the best debugged suite.
>
>I am very sure of it.  Every position has been analyzed by multiple strong
>engines for long time control.  No other suite has the same effort applied to it
>as far as I know.  I think that MES is getting similar effort now.  But since it
>is a much more difficult test, it will take a long time to shake out all
>potential errors.
>
>>This suite is simply too easy so when I use test suites to test my program I
>>prefer harder tests.
>>More interesting tactical test suites are arasan test suite and ecmgcp test
>>suite and I certainly tested movei more often in these tests and not in WAC.
>
>I agree that WAC is only useful for beginning engines and also for simple
>verification that you have not broken something.
>
>But ecmgcp and arasan test are not as carefully debugged as WAC.
>
>Because Arasan test is small, it is likely to have fewer problems than ecmgcp.
>Ecmgcp has had more debugging efforts than Arasan, so it could also be the
>reverse.
>
>I am very sure that there are still cooks in Ecmgcp but not as sure about
>Arasan.
>>Uri

I like Alessandro's and Dieter's suites. I got a lot of ideas from them for
positions. Alessandro's under promotion suite is especially nice.

Christopher
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.