Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Crafty Static Evals 2 questions

Author: Robert Hyatt

Date: 07:49:16 02/27/04

Go up one level in this thread


On February 27, 2004 at 05:24:50, martin fierz wrote:

>On February 26, 2004 at 23:14:22, Robert Hyatt wrote:
>
>>On February 26, 2004 at 17:54:09, martin fierz wrote:
>>
>>>On February 26, 2004 at 13:17:50, Robert Hyatt wrote:
>>>
>>>>On February 26, 2004 at 06:59:37, martin fierz wrote:
>>>>
>>>>>On February 25, 2004 at 12:30:38, Robert Hyatt wrote:
>>>>>
>>>>>>On February 25, 2004 at 12:09:16, Daniel Clausen wrote:
>>>>>>
>>>>>>>On February 25, 2004 at 10:52:27, Robert Hyatt wrote:
>>>>>>>
>>>>>>>>On February 25, 2004 at 05:56:16, martin fierz wrote:
>>>>>>>
>>>>>>>[snip]
>>>>>>>
>>>>>>>>>i don't know whether i should believe the eval discontinuity thing. i know
>>>>>>>>>somebody recently quoted a paper on this, but it's just a fact: exchanging any
>>>>>>>>>pieces necessarily changes the evaluation. sometimes not by very much. big
>>>>>>>>>changes are usually the exchange of the queen, the exchange of the last rook,
>>>>>>>>>the exchange of the last piece. these eval discontinuities are *real*. i don't
>>>>>>>>>believe in smoothing them out. perhaps if you write an eval with
>>>>>>>>>discontinuities it's harder to get it right that everything fits in with each
>>>>>>>>>other, and that's why it's supposed to be bad?!
>>>>>>>>
>>>>>>>>No.  When you have a discontinuity, you give the search something to play with,
>>>>>>>>and it can choose when to pass over the discontinuity, sometimes with
>>>>>>>>devastating results..
>>>>>>>
>>>>>>>The arguments of you two could be combined to this:
>>>>>>>
>>>>>>>   Eval discontinuities are _real_ but it hurts the search too much and
>>>>>>>   therefore it's better to be a tad less realistic in eval here in order
>>>>>>>   to get maximum performance out of the search+eval.
>>>>>>>
>>>>>>>
>>>>>>>Does that make any sense?
>>>>>>>
>>>>>>>Sargon
>>>>>>
>>>>>>
>>>>>>That is not quite the issue.  Consider the following X-Y plot of your
>>>>>>eval function (Y axis) against some positional component (X-axis):
>>>>>>
>>>>>>   |
>>>>>>   |
>>>>>>   |
>>>>>>   |      *
>>>>>> E |* * *   * *
>>>>>> V |            * *
>>>>>> A |                * *
>>>>>> L |
>>>>>>   |
>>>>>>   |
>>>>>>   |
>>>>>>   |
>>>>>>   |
>>>>>>   |
>>>>>>   |                    * * * * * * * * * * * * * * * * * * * * * * *
>>>>>>   |_________________________________________________________________
>>>>>>                   some feature you are evaluating
>>>>>>
>>>>>>Notice the sudden drop to zero.  If you start off in a position where the score
>>>>>>is non-zero for this term, and you can search deep enough to drive over the
>>>>>>"cliff" for this term and hit zero, strange things happen.  The search can use
>>>>>>this as a horizon-effect solution to some problem.  And it will be able to use
>>>>>>that sudden drop (when something goes too far) as opposed to the big bonus just
>>>>>>before it goes too far, to manipulate the score, the path, the best move, and
>>>>>>possibly the outcome of the game.
>>>>>>
>>>>>>This is what Berliner's paper was about.  I suspect that anybody that has worked
>>>>>>on a chess engine for any length of time has run across this problem and had to
>>>>>>solve it by smoothing that sudden drop so that there is no "edge condition" that
>>>>>>the search can use to screw things up.
>>>>>
>>>>>another reason for not believing this stuff: your above graph shows *exactly*
>>>>>what happens when you go from a non EGTB position to an EGTB position (or, for
>>>>>that matter, what happens when you go into any position your program can
>>>>>recognize as a draw whether it has tablebases or not): your eval thinks it's
>>>>>doing great, but the exchange of something leads to a drawn position in your
>>>>>tablebases. are you going to claim that crafty plays better without TBs?
>>>>>:-)
>>>>
>>>>Nope, not the same thing.  EGTB info is _perfect_.  The eval is not.
>>>
>>>why did i know you would say that? :-)
>>>
>>>i just don't believe it. perhaps the eval is not perfect, so what? if your
>>>argument is correct, then there must be some threshold for the "degree of
>>>correctness" for the eval discontinuity to work. if it's "correct enough", it
>>>will work - like EGTB info which has 100% correctness. what makes you think
>>>other eval terms cannot be "correct enough"?
>>>
>>>cheers
>>>  martin
>>
>>All I can say is that _everybody_ has seen the effect.  It is well-known, and
>>causes problems.  One example is just say "endgame starts here" with a specific
>>material count, and watch what happens.  When you are right around that material
>>level, you will see odd things happen, from making poor positional moves that
>>lose the game, to avoiding making good moves, because the program either wants
>>or doesn't want to "cross the bridge".  If you make the transition smoother,
>>then there is no bridge to cross, just a small step at a time and you end up
>>where you want without the new type of horizon effect problems a discontinuity
>>causes.
>>
>>Of course, if you don't believe it, that is perfectly fine.  But I'll bet
>>dollars to doughnut holes that one day you will say "hmm...  perhaps Bob (and
>>many others) was actually right here..."  :)
>
>
>looks like you have run out of arguments if you can't counter my
>"level-of-correctness" thing with anything better than "everybody has seen the
>effect"!
>you are getting very close to "i have seen this and therefore it must be this
>way" that you-know-who always uses :-)

What more can I say?  I cited a paper by Berliner that was a very good
description of the problem.  I cited examples that have burned me in the past,
and I pointed out that most everybody has discovered that the problem happens as
they work on a chess program.

I pointed out the EGTB issue is not the same, because it is specifically related
to a capture issue that we already try to handle in the search, but that we have
all seen some oddball behaviors there as well.  But there the discontinuity is
from imperfect information to perfect information, at least.  With the other way
of hitting this problem, it is from imperfect to imperfect, where the eval makes
a significant jump, and the program can choose where that happens without the
search extensions being able to limit it very effectively.

IE when you decide to do a parallel search, you can either listen to someone
that has already done it, and most likely understands most of the issues you
will face, or you can do it "on your own", make the _same_ mistakes others have
already made, and then say "dang, I should have thought of that first..."

This is the _same_ kind of deal.  All evals are discontinuous, since they use
integer values rather than real values, but here the issue is in the "gap size".
 As that gets bigger, the search will find interesting ways to use the horizon
effect to cause this "gap" to influence the score in ways not intended.

It is as sure as rain...

>
>to state that _everybody_ has seen this is obviously wrong since you haven't
>spoken to everybody, and mainly not to those who have written the best programs
>out there, since they don't reveal their tricks - the commercials. one striking
>example of a big eval discontinuity is published on ed schröder's page about
>rebel: rebel goes from a complex king safety eval to ZERO king safety eval once
>the queen goes off the board. now if that isn't an eval discontinuity, then i
>don't know what an eval discontinuity is supposed to be. and it works for him,
>it seems, and i think he's quite a trustworthy source of information when it
>comes to chess programming. so are you of course, but it's not like everybody
>seems to be in agreement with you here...

OK "everybody" means "most computer chess programmers that have been doing this
for a reasonable length of time."  I assumed that would be pretty obvious...


>
>i'll reiterate my point of view: eval discontinuities *are* dangerous. they must
>be well-tuned so that they don't produce unexpected and/or bad results. but they
>are a real fact, well known to any human expert, and reflecting such a fact in
>your eval can never be bad, if it is sufficently correct. whatever
>"sufficiently" means...
>


"sufficiently" means "no big discontinuities" nothing more, nothing less.



>perhaps it's practically impossible to write a "sufficiently correct" heuristic
>eval, because it gets very difficult to get the tuning right. but from a
>theoretical point of view, if you accept that tablebases are good for a program
>you have absolutely no chance in this argument!
>
>cheers

So you don't see the difference between a discontinuity from an imperfect score
to an imperfect score, vs a discontinuity from an imperfect score to a perfect
score?  I do.  Certainly both have their problems, we see it all the time.  But
one has a _much_ larger problem than the other.


>  martin
>
>



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.