Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Test suites

Author: Dann Corbit

Date: 10:08:44 02/01/06

Go up one level in this thread


On February 01, 2006 at 12:31:55, Uri Blass wrote:

>On February 01, 2006 at 12:04:47, Dann Corbit wrote:
>
>>On February 01, 2006 at 11:14:36, David B Weller wrote:
>>
>>>I was just here trying to figure out why my engine doesnt get a certain bm for a
>>>positional test, and it occurred to me ...
>>>
>>>Why would I trust that?
>>>
>>>Many of the basic terms, eg., isolated pawn, have a fairly well established
>>>value, representing a statisitical average over many, many positions
>>>
>>>If my engine,is missing some positonal move, for no other reason than I can
>>>tell, except perhaps my isolated = 20 should be isolated = 25, then I am
>>>disregarding the trillions of other positions where it is, statistically
>>>speaking, really 20
>>>
>>>As it has been pointed out many times, these tests suites are good only for
>>>detecting gross errors
>>>
>>>So if you plan on tweaking the value of your SE metrics by test suites, make
>>>sure it has about a million positions ;-)
>>>
>>>Maybe this is why 'auto' tuning is hard. Because if the suite doesnt contain
>>>enough data to be representative of all the features one is trying to tune, it
>>>will just be a waste of time, and make it worse...
>>>
>>>It could be that many problems can be easily solved, simply by inflating or
>>>deflating the right term(s). And certainly a 'genetic' algorithm would find the
>>>right ones to inflate/deflate on a small set of positions in order to get more
>>>of them right...
>>>
>>>Fact is, it could be the very reason the position got in the test suite, is
>>>because its is a little 'freakish'. Then what? We're tuning our engines to
>>>become worse!
>>>
>>>my $0.02
>>>
>>>IMHO
>>>
>>>-David
>>
>>And yet the really good engines tend to solve all of them, or nearly all of
>>them.
>
>You are talking about tactical suites when david was talking about positional
>suites.
>
>>
>>Of course, an equal problem to test suites is that all of them are full of
>>outright mistakes and errors.
>>
>>Probably the best debugged suite is WAC and yet I imagine that it still contains
>>errors.
>
>I doubt if it is the best debugged suite.

I am very sure of it.  Every position has been analyzed by multiple strong
engines for long time control.  No other suite has the same effort applied to it
as far as I know.  I think that MES is getting similar effort now.  But since it
is a much more difficult test, it will take a long time to shake out all
potential errors.

>This suite is simply too easy so when I use test suites to test my program I
>prefer harder tests.
>More interesting tactical test suites are arasan test suite and ecmgcp test
>suite and I certainly tested movei more often in these tests and not in WAC.

I agree that WAC is only useful for beginning engines and also for simple
verification that you have not broken something.

But ecmgcp and arasan test are not as carefully debugged as WAC.

Because Arasan test is small, it is likely to have fewer problems than ecmgcp.
Ecmgcp has had more debugging efforts than Arasan, so it could also be the
reverse.

I am very sure that there are still cooks in Ecmgcp but not as sure about
Arasan.
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.