Author: Jay Scott
Date: 14:23:52 02/25/98
Go up one level in this thread
On February 25, 1998 at 05:09:41, Amir Ban wrote: >Let me make a try at this: The evaluation of a position should be a >measure of the expected outcome of the game (with assumed perfect play), >i.e. it can be mapped to a probability of winning. Say, with a score of >+0.5 you expect to win 65%, and with a score of -4 you expect to win >0.6%. The probability for winning should be monotonic with the score, or >else something is bad with the function. So one way to define a better >evaluation is if it is more monotonic. You can also actually decide on >score-to-probability mapping with some exponential say, and declare that >the better evaluation is the one that fits better the mapping. > >I think this definition is on the right track, but there is something >clearly wrong with it: First, there is an almost infinite number of such >functions that would give you a perfect fit, but most of them are >nonsense. For example, an evaluation function that always returns 0 is >perfect in this sense, but obviously useless. A less extreme example is >an evaluation that limits itself to score from +0.5 to -0.5, but does >that perfectly. This is a perfect but wishy-washy evaluation, so not >very useful, because practically you need to know that capturing the >queen gives you 99.9% win, and this evaluation never guarantees more >than say 70%. Of course, if you are already a piece ahead, this >evaluation gives you no guidance at all. I think this is a different topic, so I replied separately. It's like the weather report. If it says there's a 30% chance of rain tomorrow, then 30% of the time it rains (it's true!). Meteorologists are said to be "well-calibrated" in that sense. Ideally you'd like to hear either that there's a 100% chance of rain or a 0% chance, but in practice if they did that then they'd be wrong a lot of the time. A chess program is in an exactly analogous position. It would like to be as discriminating as possible while staying well- calibrated. I don't think it's a serious problem in practice. You can measure everything from game data. There is another degenerate case: the evaluation which says that every position is a loss. A program that uses it will probably find that it is accurate. :-) This is a practical difficulty in temporal difference learning, though it's not too hard to work around. Jay
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.