Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Evaluation comparative test for Amateur Engines (PROPOSAL)

Author: Gerd Isenberg

Date: 13:45:13 02/20/04

Go up one level in this thread


On February 20, 2004 at 14:36:28, Jaime Benito de Valle Ruiz wrote:

>Most people keep using test sets to find out how "good" their engines are doing;
>this gives us a rough idea of their strength. Of course, real games are better
>for this purpose.
>
>Tom Romstad just posted a message asking for opinions regarding a particular
>static evaluation, and many others answered by giving the score given by their
>engines. I find this most interesting:
>Although there is nothing such as a "perfect score" for a position, and the play
>style of an engines strongly depends on this value, I'm sure most here will
>agree that there must be a sensible range of values to be considered reasonable.
>Surely, these upper and lowere bounds could be set tighter or wider depending on
>the nature of the position.
>I could be wrong, but I would expect many engines with similar strength to give
>scores within a reasonably range for a full list of static test positions.
>
>If I (or anyone else) provide you with a list of positions, would you be
>interesting in providing the static values that your engine get for each
>position? If your engine can read EPD files, adapting the code to read each
>position and write another file with the static scores should be fairly straight
>forward.
>I'm sure this information could be extremely useful for many to find potential
>flaws in their engines using an easy automatic process.
>
>We could compile one or more of these static tests (similar to EPDs) and suggest
>a range of values for each position based on the ones given by the strongest
>engines.
>
>Example: (with ficticious scores)
>----------------------------------
>
>Test File:
>
>3r2k1/3b2p1/1q2p2p/4P3/3NQ1P1/bP5P/PpP1N3/1K1R4 w - -; id "Test 0001"
>r2k1b1r/2p2ppp/p3q3/2PpN3/Q2Pn3/4B3/PP3PPP/1R2R1K1 b - -; id "Test 0001"
>1r2r2k/1bq3p1/p1p1Bp1p/1p3Q2/3PP3/1PnN2P1/5P1P/R3R1K1 b - -; id "Test 0001"
>......
>
>Output (3 engines):
>
>      Engine A   Engine B   Engine C            Range
>
>0001:  +0.42       +0.12      +0.52        [ 0.12 ,  0.52]
>0002:  +3.00       +2.83      +3.42        [ 2.83 ,  3.42]
>0003:  -1.23       -0.88      -1.24        [-1.24 , -0.88]
>.....
>
>Let me know if you're interested.
>Regards,
>
>  Jaime

It is interesting.

But does comparing final eval-scores make sense, even with a normalized range?
Programs, i mean search and eval are designed too differently.
There is no or only vague information (piece interaction) about all the zillions
of aspects/pattern and weights of the evaluation, only a final eval result.

Eval scores of complex positions with a lot of important aspects are difficult
to interprete. Eg. kingsafety issues for both sides, passers, "active piece
play", static/dynamic pawn structures, unbalanced material, weak/strong squares
and their interactions ...

I prefere to discuss some eval aspects, interactions and vague implementation
hints with concrete positions from time to time...

Programmer don't like to share all their eval tricks for obvious reasons ...

Cheers,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.