Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: brilliant post! Maybe comps should show margin of error with evaluatio?

Author: Vasik Rajlich
Date: 03:08:55 01/03/06
Please find my comments below ..

On January 03, 2006 at 05:02:53, Dagh Nielsen wrote:

>On January 03, 2006 at 00:37:25, Joseph Ciarrochi wrote:
>
>>
>>>
>>>It depends on the stage or the phase of the game, an static evaluation by a
>>>computer engine of less than +1.0 in an opening stage is less reliable when it
>>>gives an evaluation less than +1.0 in the endgame stage. Thus, an evaluation of
>>>less than 1.0 is not quite a severe disadvantage in the opinion of chess praxis
>>>because there some dynamic compensation for Black.  Dynamic compensations are
>>>not static values.  Its value can change completely as the position unfolds more
>>>and more to a static position. Hence, an static evaluation of 1.0 is a great
>>>advantage when the position is void of dynamics or the position approaches to a
>>>more static phase of the game, for example, the transition to an endgame. Even
>>>in an endgame an advantage of +1.0 for either side is not very much if it is a
>>>rook ending, compared to a minor ending, Knight vs. Bishop, the +1.0 advantage
>>>can be enough advantage to convert to a bigger advantage. My recommendation is
>>>to stop evaluating the position from a STATIC point of view, but start to
>>>evaluate the position more from a DYNAMIC point of view, this is what IM Andrew
>>>Martin is reccommending.
>>>
>>>My 2 cents,
>>>
>>>Laurence
>>
>>This is a brilliant post, Laurence! Thanks for that.
>>
>>
>>It would be great if the computers could caputure the uncertainty of the
>>evaluation (e.g., a particular rook ending might be +1 with a plus/minus .8
>>confidence interval) whereas knight ending might often be +1 with plus/minus .2
>>confidence interval). Is this possible?
>
>In my understanding, a good evaluation function already integrates "confidence".
>This is very important when the engine has to decide which endings to enter.
>Opposite colored bishop endings are drawish, so +1 pawn should not give the same
>score as +1 pawn in a knight ending. Queen endings are drawish too etc.
>
>In a deeper sense, an evaluation may be better understood as a prediction of the
>result than as a counter of n pawns' advantage. The task is then to calibrate
>the evaluation, depending e.g. on piece set and extramaterial factors, so that
>the engine sorts its predictions well.
>
>There is a good reason to build up an evaluation function around the value of a
>pawn, namely that all the experience gathered through 100's of years about
>relative piece values can then be added straightforwardly. But I wonder if there
>is any merit to the idea of letting the evaluation give a predicted percentage
>score directly instead (that is, a position is evaluated at 63% for some white
>advantage). One main idea is to get into the habit of THINKING in this way while
>tuning the evaluation function, maybe by collecting stats and process them in
>relation to feature bonuses (point being that a -320,320 interval can of course
>be mapped more or less straightforwardly onto a 0%-100% interval anyway).
>
>But also, once you do it this way, you get some kind of absolute point of
>reference. Say, you have worked a lot on rook ending evaluation. You take a
>large set of random rook endings with white one pawn up, and average your static
>evaluation of these positions and end up with 80%. BUT when played out by top
>engines, the score is only 63%. THEN your evaluation is probably misleading; it
>may be good for sorting predictions in +1 pawn rook endings, but once your
>engine has to compare evaluation of rook endings vs., for instance, knight
>endings, you would probably have a problem. But at least you KNOW you have a
>problem :-) If the evaluation is based on the usual pawn unit instead and you
>have no standard conversion to percentage predictions, there really is no way to
>decide that the 0.93 average is too high and should be somehow calibrated down
>to 0.72 instead in order for the engine to compare rook ending advantages with
>knight ending advantages appropriately.
>

Indeed, Rybka already thinks in winning % rather than centipawns. It's just that
current GUIs expect centipawns, so this is what is output. (This may change.)

(The only minor issue here is "distance to mate".)

>But back to confidence... While a static evaluation can be given an easy
>interpretation as predicted score, the issue gets a lot messier when dealing
>with root position evaluation from min-max of leaf node evaluations.
>
>A typical example: Would you rather play a move evaluated at depth 4 at +1.00
>than another move evaluated at depth 12 at +0.80?
>

Indeed, a very good (and tricky) point. Chrilly posted about this here a few
weeks ago.

If you search deeply enough, all advantages which are insufficient to win will
drop to 50%, and all advantages which are sufficient to win will increase to
100%.

This happens on a smaller scale when comparing 5 ply searches with 10 ply
searches.

>The answer is simple, but the consequences for engine implementation is not :-)
>The main reasons:
>
>1) Most moves in the tree are not precisely evaluated, but only given caps on
>evaluation (worse than, alpha-beta).
>
>2) Engine use iterative search anyway and have a current global reference depth.
>

Yes, but some variations are searched deeper - the tree is imbalanced, there is
no way around it.

>The big task seems to be to find a way to interpret the profile of an alpha-beta
>generated tree in smart ways that makes you make "confident" adjustments to
>evaluations and confident decisions about the value of different search
>directions.
>
>One interesting question to pose is, given a tree of analysis with leaf node
>evaluations, which node would you expand in order to "improve" the tree of
>analysis the most, if given a free choice, but only one node to expand?
>

Yes, this is the basic search question. Inside an iterative framework, it is
rephrased as: "what is the value of expanding this node?".

>And, a follow up question, exactly how would you measure the success of such a
>decision? It gives you the greatest improved confidence that your tree of
>analysis will decide on the "ultimately right" move one step ahead? (consider:
>best confidence is not the same as most profitable, the possible fatal
>consequences of different decision are not measured and integrated in the
>question) What if you can expand 100 nodes, one at a time, will the best
>one-step strategy lead to the best "after 100 steps" strategy? Not necessarily,
>but how often?
>

The value of expanding a node in the tree is 0 if the root move does not change
as a result of the expansion, and the differences in score between the old root
move and the new root move if it does change.

The expected value of expanding a node in the tree is the chance that doing so
changes the move at the root multiplied by the expected superiority of the new
move.

>Sigh, all this is highly dynamic and chaotic and holistic and... In the end,
>each move just have 3 possible principally correct evaluations, and the rest is
>just qualified speculation. Some strategies work well, others don't.
>
>Conspiracy search is one attempt to tackle the above considerations. I would be
>interested to hear what Vasik Rajlich has to add about that :P
>

Sorry, no comment here :)

Best regards,
Vas

>Regards,
>Dagh Nielsen
>








>
>>
>>I have another question? Why are computers more prone to error with dynamic
>>positions? Rybka at 19 ply still sees .8 for white. If rybka is this deep,
>>doesn't it see how the position changes?
>>
>>Still a little confused :(
>>
>>best
>>Joseph
Re: brilliant post! Maybe comps should show margin of error with evaluatio? Brian Richardson 06:05:12 01/03/06
- Re: brilliant post! Maybe comps should show margin of error with evalua Stuart Cracraft 13:28:06 01/03/06
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.