Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Evaluation comparative test for Amateur Engines (PROPOSAL)

Author: Tom Likens

Date: 12:01:44 02/20/04

Go up one level in this thread


On February 20, 2004 at 14:36:28, Jaime Benito de Valle Ruiz wrote:

>Most people keep using test sets to find out how "good" their engines are doing;
>this gives us a rough idea of their strength. Of course, real games are better
>for this purpose.
>
>Tom Romstad just posted a message asking for opinions regarding a particular
>static evaluation, and many others answered by giving the score given by their
>engines. I find this most interesting:
>Although there is nothing such as a "perfect score" for a position, and the play
>style of an engines strongly depends on this value, I'm sure most here will
>agree that there must be a sensible range of values to be considered reasonable.
>Surely, these upper and lowere bounds could be set tighter or wider depending on
>the nature of the position.
>I could be wrong, but I would expect many engines with similar strength to give
>scores within a reasonably range for a full list of static test positions.
>
>If I (or anyone else) provide you with a list of positions, would you be
>interesting in providing the static values that your engine get for each
>position? If your engine can read EPD files, adapting the code to read each
>position and write another file with the static scores should be fairly straight
>forward.
>I'm sure this information could be extremely useful for many to find potential
>flaws in their engines using an easy automatic process.
>
>We could compile one or more of these static tests (similar to EPDs) and suggest
>a range of values for each position based on the ones given by the strongest
>engines.
>
>Example: (with ficticious scores)
>----------------------------------
>
>Test File:
>
>3r2k1/3b2p1/1q2p2p/4P3/3NQ1P1/bP5P/PpP1N3/1K1R4 w - -; id "Test 0001"
>r2k1b1r/2p2ppp/p3q3/2PpN3/Q2Pn3/4B3/PP3PPP/1R2R1K1 b - -; id "Test 0001"
>1r2r2k/1bq3p1/p1p1Bp1p/1p3Q2/3PP3/1PnN2P1/5P1P/R3R1K1 b - -; id "Test 0001"
>......
>
>Output (3 engines):
>
>      Engine A   Engine B   Engine C            Range
>
>0001:  +0.42       +0.12      +0.52        [ 0.12 ,  0.52]
>0002:  +3.00       +2.83      +3.42        [ 2.83 ,  3.42]
>0003:  -1.23       -0.88      -1.24        [-1.24 , -0.88]
>.....
>
>Let me know if you're interested.
>Regards,
>
>  Jaime

Hello Jaime,

I'd be interested.  It might be nice if the test suite was broken up into
different
sections.  For example, a large number of positions evaluating king safety would
be useful.  It might also be interesting if we had a number of unbalanced
ultra-dyanamic positions (3 minors vs. queen, 2 rooks vs. 3 minors etc.)  Also
positions that had a similar theme could be enlightening.  Two bishops type
positions, minor vs. minor type games, hogs on the 7th and so forth.
Obviously, this could be a *very* long list, but it should be to be really
useful.

The other aspect of this is the information returned by the various evaluation
routines.  Djinn, Crafty and many other engines list multiple components of the
evaluation routine (type "help eval" at the Djinn command line to see the
options
available).  Seeing a score of +1.35 indicates the trend but doesn't give any
real detail and IMHO is of limited value.  We might even decide on a minimum
set of parameters to produce and display (king safety, pawn structure, space
etc.)

regards,
--tom



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.