Author: Vasik Rajlich
Date: 03:08:55 01/03/06
Go up one level in this thread
Please find my comments below .. On January 03, 2006 at 05:02:53, Dagh Nielsen wrote: >On January 03, 2006 at 00:37:25, Joseph Ciarrochi wrote: > >> >>> >>>It depends on the stage or the phase of the game, an static evaluation by a >>>computer engine of less than +1.0 in an opening stage is less reliable when it >>>gives an evaluation less than +1.0 in the endgame stage. Thus, an evaluation of >>>less than 1.0 is not quite a severe disadvantage in the opinion of chess praxis >>>because there some dynamic compensation for Black. Dynamic compensations are >>>not static values. Its value can change completely as the position unfolds more >>>and more to a static position. Hence, an static evaluation of 1.0 is a great >>>advantage when the position is void of dynamics or the position approaches to a >>>more static phase of the game, for example, the transition to an endgame. Even >>>in an endgame an advantage of +1.0 for either side is not very much if it is a >>>rook ending, compared to a minor ending, Knight vs. Bishop, the +1.0 advantage >>>can be enough advantage to convert to a bigger advantage. My recommendation is >>>to stop evaluating the position from a STATIC point of view, but start to >>>evaluate the position more from a DYNAMIC point of view, this is what IM Andrew >>>Martin is reccommending. >>> >>>My 2 cents, >>> >>>Laurence >> >>This is a brilliant post, Laurence! Thanks for that. >> >> >>It would be great if the computers could caputure the uncertainty of the >>evaluation (e.g., a particular rook ending might be +1 with a plus/minus .8 >>confidence interval) whereas knight ending might often be +1 with plus/minus .2 >>confidence interval). Is this possible? > >In my understanding, a good evaluation function already integrates "confidence". >This is very important when the engine has to decide which endings to enter. >Opposite colored bishop endings are drawish, so +1 pawn should not give the same >score as +1 pawn in a knight ending. Queen endings are drawish too etc. > >In a deeper sense, an evaluation may be better understood as a prediction of the >result than as a counter of n pawns' advantage. The task is then to calibrate >the evaluation, depending e.g. on piece set and extramaterial factors, so that >the engine sorts its predictions well. > >There is a good reason to build up an evaluation function around the value of a >pawn, namely that all the experience gathered through 100's of years about >relative piece values can then be added straightforwardly. But I wonder if there >is any merit to the idea of letting the evaluation give a predicted percentage >score directly instead (that is, a position is evaluated at 63% for some white >advantage). One main idea is to get into the habit of THINKING in this way while >tuning the evaluation function, maybe by collecting stats and process them in >relation to feature bonuses (point being that a -320,320 interval can of course >be mapped more or less straightforwardly onto a 0%-100% interval anyway). > >But also, once you do it this way, you get some kind of absolute point of >reference. Say, you have worked a lot on rook ending evaluation. You take a >large set of random rook endings with white one pawn up, and average your static >evaluation of these positions and end up with 80%. BUT when played out by top >engines, the score is only 63%. THEN your evaluation is probably misleading; it >may be good for sorting predictions in +1 pawn rook endings, but once your >engine has to compare evaluation of rook endings vs., for instance, knight >endings, you would probably have a problem. But at least you KNOW you have a >problem :-) If the evaluation is based on the usual pawn unit instead and you >have no standard conversion to percentage predictions, there really is no way to >decide that the 0.93 average is too high and should be somehow calibrated down >to 0.72 instead in order for the engine to compare rook ending advantages with >knight ending advantages appropriately. > Indeed, Rybka already thinks in winning % rather than centipawns. It's just that current GUIs expect centipawns, so this is what is output. (This may change.) (The only minor issue here is "distance to mate".) >But back to confidence... While a static evaluation can be given an easy >interpretation as predicted score, the issue gets a lot messier when dealing >with root position evaluation from min-max of leaf node evaluations. > >A typical example: Would you rather play a move evaluated at depth 4 at +1.00 >than another move evaluated at depth 12 at +0.80? > Indeed, a very good (and tricky) point. Chrilly posted about this here a few weeks ago. If you search deeply enough, all advantages which are insufficient to win will drop to 50%, and all advantages which are sufficient to win will increase to 100%. This happens on a smaller scale when comparing 5 ply searches with 10 ply searches. >The answer is simple, but the consequences for engine implementation is not :-) >The main reasons: > >1) Most moves in the tree are not precisely evaluated, but only given caps on >evaluation (worse than, alpha-beta). > >2) Engine use iterative search anyway and have a current global reference depth. > Yes, but some variations are searched deeper - the tree is imbalanced, there is no way around it. >The big task seems to be to find a way to interpret the profile of an alpha-beta >generated tree in smart ways that makes you make "confident" adjustments to >evaluations and confident decisions about the value of different search >directions. > >One interesting question to pose is, given a tree of analysis with leaf node >evaluations, which node would you expand in order to "improve" the tree of >analysis the most, if given a free choice, but only one node to expand? > Yes, this is the basic search question. Inside an iterative framework, it is rephrased as: "what is the value of expanding this node?". >And, a follow up question, exactly how would you measure the success of such a >decision? It gives you the greatest improved confidence that your tree of >analysis will decide on the "ultimately right" move one step ahead? (consider: >best confidence is not the same as most profitable, the possible fatal >consequences of different decision are not measured and integrated in the >question) What if you can expand 100 nodes, one at a time, will the best >one-step strategy lead to the best "after 100 steps" strategy? Not necessarily, >but how often? > The value of expanding a node in the tree is 0 if the root move does not change as a result of the expansion, and the differences in score between the old root move and the new root move if it does change. The expected value of expanding a node in the tree is the chance that doing so changes the move at the root multiplied by the expected superiority of the new move. >Sigh, all this is highly dynamic and chaotic and holistic and... In the end, >each move just have 3 possible principally correct evaluations, and the rest is >just qualified speculation. Some strategies work well, others don't. > >Conspiracy search is one attempt to tackle the above considerations. I would be >interested to hear what Vasik Rajlich has to add about that :P > Sorry, no comment here :) Best regards, Vas >Regards, >Dagh Nielsen > > >> >>I have another question? Why are computers more prone to error with dynamic >>positions? Rybka at 19 ply still sees .8 for white. If rybka is this deep, >>doesn't it see how the position changes? >> >>Still a little confused :( >> >>best >>Joseph
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.