Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: calibrating the evaluation function

Author: Komputer Korner

Date: 05:45:54 07/31/98

Go up one level in this thread


On July 31, 1998 at 07:55:18, Jay Scott wrote:

>
>On July 31, 1998 at 01:51:57, blass uri wrote:
>
>>The programs I know give me evaluation in pawns and I prefer to see
>>in the evaluation function the predicted result of the game(number between 0
>>and 1) and not an evaluation in pawns.
>
>For my part, I'd prefer a probability distribution giving the chances
>of a win, loss or draw. But the chess programmers don't seem to have
>any plans for it. What you are asking for can be called the equity of
>the position. It's (probability of win) + 0.5 * (probability of draw),
>if you assume that a draw is worth 0.5. (In a tournament or match, the
>value of a draw may be more or less than 0.5, depending on the
>tournament or match situation.)
>
>You can use Komputer Korner's table (from his posting in this thread)
>to get a rough idea of how to convert a score from pawns to equity.
>However, it may be that every program is different. If so, you'll have
>to calibrate each program you're interested in separately.
>
>One way to estimate a program's score->equity conversion is by
>having the program play a lot of games against itself (it should be
>against an equal opponent, and what opponent is more equal than
>itself?). Divide the range of scores into intervals, maybe 0-0.2,
>0.201-0.3, etc. For each interval, count up the number of times that
>a score in that interval occurred in won, lost, and drawn games.
>Then you know what a score in that interval means.
>
>You need a lot of games to make the statistics valid. It would be nice
>to automate the process. For example, to do it with crafty you'd
>like to write a program that reads crafty's log and adds up all
>the numbers.
>
>I'd like to recommend this exercise to chess programmers as a way
>to test the meaning and validity of their evaluation functions.
>You can also use it to examine individual evaluation factors.
>For example, if you're wondering about your two-bishops bonus,
>you can run the numbers only for positions where one side has
>the advantage of two bishops. If the bonus is too big, you should
>expect to see a flatter curve, or a shifted curve, as the score goes
>up more than the chance of winning. You're less likely to see an
>effect if the bonus is too small, because the side with two bishops
>will be willing to give them up without taking full advantage of them.
>
>Summary: You get more information from the detailed behavior of the
>evaluation function in test games than you get from only the results
>of the games.
>
>  Jay


You are forgetting about 1 thing. AAAAAgh I hate to bring it up again.
Asymmetry. If the program has any asymmetric code in it then the evaluation
score is useless for purposes of your experiment. If a program evaluates a pawn
as 100 units and the piece values correspond to this, then as long as there is
no asymmetric code, my annotation table gives a true picture of the expected
result assuming that the evaluation score is accurate. Almost all of the
programmers will tell you not to trust the evaluation score but we all do at
some point. I have found that Junior 4.6 is the most accurate for evaluating a
position score. Note that a very accurate (Okay, it isn't perfect)  position
score doesn't guarantee that the program will win every game. Other factors like
knowledge, opening book, search efficiency come into play in a big way.
It is possible that a program could have less knowledge and yet be more accurate
in it's positional score because of 2 reasons. 1) less asymmetry than the other
program 2) the extra knowledge of the other program comes into play in
exceptional positions and the same body of knowledge in that other program does
not produce as accurate a picture of the position as the first program. Also
since all the engine algorithms are different they all shine on particular
positions so that the end results are complex. Also some programs are root
evaluators and others are end node evaluators and others in between. Bob Hyatt
can probably give you a dozen more reasons why different programs come up with
different evaluations on the same position.
--
Komputer Korner



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.