Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Evaluation comparative test for Amateur Engines (PROPOSAL)

Author: Robert Hyatt

Date: 18:47:16 02/20/04

Go up one level in this thread


On February 20, 2004 at 16:45:13, Gerd Isenberg wrote:

>On February 20, 2004 at 14:36:28, Jaime Benito de Valle Ruiz wrote:
>
>>Most people keep using test sets to find out how "good" their engines are doing;
>>this gives us a rough idea of their strength. Of course, real games are better
>>for this purpose.
>>
>>Tom Romstad just posted a message asking for opinions regarding a particular
>>static evaluation, and many others answered by giving the score given by their
>>engines. I find this most interesting:
>>Although there is nothing such as a "perfect score" for a position, and the play
>>style of an engines strongly depends on this value, I'm sure most here will
>>agree that there must be a sensible range of values to be considered reasonable.
>>Surely, these upper and lowere bounds could be set tighter or wider depending on
>>the nature of the position.
>>I could be wrong, but I would expect many engines with similar strength to give
>>scores within a reasonably range for a full list of static test positions.
>>
>>If I (or anyone else) provide you with a list of positions, would you be
>>interesting in providing the static values that your engine get for each
>>position? If your engine can read EPD files, adapting the code to read each
>>position and write another file with the static scores should be fairly straight
>>forward.
>>I'm sure this information could be extremely useful for many to find potential
>>flaws in their engines using an easy automatic process.
>>
>>We could compile one or more of these static tests (similar to EPDs) and suggest
>>a range of values for each position based on the ones given by the strongest
>>engines.
>>
>>Example: (with ficticious scores)
>>----------------------------------
>>
>>Test File:
>>
>>3r2k1/3b2p1/1q2p2p/4P3/3NQ1P1/bP5P/PpP1N3/1K1R4 w - -; id "Test 0001"
>>r2k1b1r/2p2ppp/p3q3/2PpN3/Q2Pn3/4B3/PP3PPP/1R2R1K1 b - -; id "Test 0001"
>>1r2r2k/1bq3p1/p1p1Bp1p/1p3Q2/3PP3/1PnN2P1/5P1P/R3R1K1 b - -; id "Test 0001"
>>......
>>
>>Output (3 engines):
>>
>>      Engine A   Engine B   Engine C            Range
>>
>>0001:  +0.42       +0.12      +0.52        [ 0.12 ,  0.52]
>>0002:  +3.00       +2.83      +3.42        [ 2.83 ,  3.42]
>>0003:  -1.23       -0.88      -1.24        [-1.24 , -0.88]
>>.....
>>
>>Let me know if you're interested.
>>Regards,
>>
>>  Jaime
>
>It is interesting.
>
>But does comparing final eval-scores make sense, even with a normalized range?
>Programs, i mean search and eval are designed too differently.
>There is no or only vague information (piece interaction) about all the zillions
>of aspects/pattern and weights of the evaluation, only a final eval result.
>
>Eval scores of complex positions with a lot of important aspects are difficult
>to interprete. Eg. kingsafety issues for both sides, passers, "active piece
>play", static/dynamic pawn structures, unbalanced material, weak/strong squares
>and their interactions ...
>
>I prefere to discuss some eval aspects, interactions and vague implementation
>hints with concrete positions from time to time...
>
>Programmer don't like to share all their eval tricks for obvious reasons ...

"some" do.

:)





>
>Cheers,
>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.