Author: Robert Hyatt
Date: 18:47:16 02/20/04
Go up one level in this thread
On February 20, 2004 at 16:45:13, Gerd Isenberg wrote: >On February 20, 2004 at 14:36:28, Jaime Benito de Valle Ruiz wrote: > >>Most people keep using test sets to find out how "good" their engines are doing; >>this gives us a rough idea of their strength. Of course, real games are better >>for this purpose. >> >>Tom Romstad just posted a message asking for opinions regarding a particular >>static evaluation, and many others answered by giving the score given by their >>engines. I find this most interesting: >>Although there is nothing such as a "perfect score" for a position, and the play >>style of an engines strongly depends on this value, I'm sure most here will >>agree that there must be a sensible range of values to be considered reasonable. >>Surely, these upper and lowere bounds could be set tighter or wider depending on >>the nature of the position. >>I could be wrong, but I would expect many engines with similar strength to give >>scores within a reasonably range for a full list of static test positions. >> >>If I (or anyone else) provide you with a list of positions, would you be >>interesting in providing the static values that your engine get for each >>position? If your engine can read EPD files, adapting the code to read each >>position and write another file with the static scores should be fairly straight >>forward. >>I'm sure this information could be extremely useful for many to find potential >>flaws in their engines using an easy automatic process. >> >>We could compile one or more of these static tests (similar to EPDs) and suggest >>a range of values for each position based on the ones given by the strongest >>engines. >> >>Example: (with ficticious scores) >>---------------------------------- >> >>Test File: >> >>3r2k1/3b2p1/1q2p2p/4P3/3NQ1P1/bP5P/PpP1N3/1K1R4 w - -; id "Test 0001" >>r2k1b1r/2p2ppp/p3q3/2PpN3/Q2Pn3/4B3/PP3PPP/1R2R1K1 b - -; id "Test 0001" >>1r2r2k/1bq3p1/p1p1Bp1p/1p3Q2/3PP3/1PnN2P1/5P1P/R3R1K1 b - -; id "Test 0001" >>...... >> >>Output (3 engines): >> >> Engine A Engine B Engine C Range >> >>0001: +0.42 +0.12 +0.52 [ 0.12 , 0.52] >>0002: +3.00 +2.83 +3.42 [ 2.83 , 3.42] >>0003: -1.23 -0.88 -1.24 [-1.24 , -0.88] >>..... >> >>Let me know if you're interested. >>Regards, >> >> Jaime > >It is interesting. > >But does comparing final eval-scores make sense, even with a normalized range? >Programs, i mean search and eval are designed too differently. >There is no or only vague information (piece interaction) about all the zillions >of aspects/pattern and weights of the evaluation, only a final eval result. > >Eval scores of complex positions with a lot of important aspects are difficult >to interprete. Eg. kingsafety issues for both sides, passers, "active piece >play", static/dynamic pawn structures, unbalanced material, weak/strong squares >and their interactions ... > >I prefere to discuss some eval aspects, interactions and vague implementation >hints with concrete positions from time to time... > >Programmer don't like to share all their eval tricks for obvious reasons ... "some" do. :) > >Cheers, >Gerd
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.