Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Evaluation comparative test for Amateur Engines (PROPOSAL)

Author: Klaus Friedel

Date: 13:15:34 02/20/04

Go up one level in this thread


On February 20, 2004 at 14:36:28, Jaime Benito de Valle Ruiz wrote:

>Most people keep using test sets to find out how "good" their engines are doing;
>this gives us a rough idea of their strength. Of course, real games are better
>for this purpose.
>
>Tom Romstad just posted a message asking for opinions regarding a particular
>static evaluation, and many others answered by giving the score given by their
>engines. I find this most interesting:
>Although there is nothing such as a "perfect score" for a position, and the play
>style of an engines strongly depends on this value, I'm sure most here will
>agree that there must be a sensible range of values to be considered reasonable.
>Surely, these upper and lowere bounds could be set tighter or wider depending on
>the nature of the position.
>I could be wrong, but I would expect many engines with similar strength to give
>scores within a reasonably range for a full list of static test positions.
>
>If I (or anyone else) provide you with a list of positions, would you be
>interesting in providing the static values that your engine get for each
>position? If your engine can read EPD files, adapting the code to read each
>position and write another file with the static scores should be fairly straight
>forward.
>I'm sure this information could be extremely useful for many to find potential
>flaws in their engines using an easy automatic process.
>
>We could compile one or more of these static tests (similar to EPDs) and suggest
>a range of values for each position based on the ones given by the strongest
>engines.
>
>Example: (with ficticious scores)
>----------------------------------
>
>Test File:
>
>3r2k1/3b2p1/1q2p2p/4P3/3NQ1P1/bP5P/PpP1N3/1K1R4 w - -; id "Test 0001"
>r2k1b1r/2p2ppp/p3q3/2PpN3/Q2Pn3/4B3/PP3PPP/1R2R1K1 b - -; id "Test 0001"
>1r2r2k/1bq3p1/p1p1Bp1p/1p3Q2/3PP3/1PnN2P1/5P1P/R3R1K1 b - -; id "Test 0001"
>......
>
>Output (3 engines):
>
>      Engine A   Engine B   Engine C            Range
>
>0001:  +0.42       +0.12      +0.52        [ 0.12 ,  0.52]
>0002:  +3.00       +2.83      +3.42        [ 2.83 ,  3.42]
>0003:  -1.23       -0.88      -1.24        [-1.24 , -0.88]
>.....
>
>Let me know if you're interested.

I'm very interrested in such a test.
It would be great to also have some good chess players give their estimation of
a reasonable eval range and what factors they consider most important.

Regards,
Klaus Friedel.





This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.