Author: Komputer Korner
Date: 05:45:54 07/31/98
Go up one level in this thread
On July 31, 1998 at 07:55:18, Jay Scott wrote: > >On July 31, 1998 at 01:51:57, blass uri wrote: > >>The programs I know give me evaluation in pawns and I prefer to see >>in the evaluation function the predicted result of the game(number between 0 >>and 1) and not an evaluation in pawns. > >For my part, I'd prefer a probability distribution giving the chances >of a win, loss or draw. But the chess programmers don't seem to have >any plans for it. What you are asking for can be called the equity of >the position. It's (probability of win) + 0.5 * (probability of draw), >if you assume that a draw is worth 0.5. (In a tournament or match, the >value of a draw may be more or less than 0.5, depending on the >tournament or match situation.) > >You can use Komputer Korner's table (from his posting in this thread) >to get a rough idea of how to convert a score from pawns to equity. >However, it may be that every program is different. If so, you'll have >to calibrate each program you're interested in separately. > >One way to estimate a program's score->equity conversion is by >having the program play a lot of games against itself (it should be >against an equal opponent, and what opponent is more equal than >itself?). Divide the range of scores into intervals, maybe 0-0.2, >0.201-0.3, etc. For each interval, count up the number of times that >a score in that interval occurred in won, lost, and drawn games. >Then you know what a score in that interval means. > >You need a lot of games to make the statistics valid. It would be nice >to automate the process. For example, to do it with crafty you'd >like to write a program that reads crafty's log and adds up all >the numbers. > >I'd like to recommend this exercise to chess programmers as a way >to test the meaning and validity of their evaluation functions. >You can also use it to examine individual evaluation factors. >For example, if you're wondering about your two-bishops bonus, >you can run the numbers only for positions where one side has >the advantage of two bishops. If the bonus is too big, you should >expect to see a flatter curve, or a shifted curve, as the score goes >up more than the chance of winning. You're less likely to see an >effect if the bonus is too small, because the side with two bishops >will be willing to give them up without taking full advantage of them. > >Summary: You get more information from the detailed behavior of the >evaluation function in test games than you get from only the results >of the games. > > Jay You are forgetting about 1 thing. AAAAAgh I hate to bring it up again. Asymmetry. If the program has any asymmetric code in it then the evaluation score is useless for purposes of your experiment. If a program evaluates a pawn as 100 units and the piece values correspond to this, then as long as there is no asymmetric code, my annotation table gives a true picture of the expected result assuming that the evaluation score is accurate. Almost all of the programmers will tell you not to trust the evaluation score but we all do at some point. I have found that Junior 4.6 is the most accurate for evaluating a position score. Note that a very accurate (Okay, it isn't perfect) position score doesn't guarantee that the program will win every game. Other factors like knowledge, opening book, search efficiency come into play in a big way. It is possible that a program could have less knowledge and yet be more accurate in it's positional score because of 2 reasons. 1) less asymmetry than the other program 2) the extra knowledge of the other program comes into play in exceptional positions and the same body of knowledge in that other program does not produce as accurate a picture of the position as the first program. Also since all the engine algorithms are different they all shine on particular positions so that the end results are complex. Also some programs are root evaluators and others are end node evaluators and others in between. Bob Hyatt can probably give you a dozen more reasons why different programs come up with different evaluations on the same position. -- Komputer Korner
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.