Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Knowledge again, but what is it?

Author: Don Dailey
Date: 09:13:57 02/25/98
On February 25, 1998 at 05:09:41, Amir Ban wrote:

>On February 25, 1998 at 00:15:59, Don Dailey wrote:
>
>>Well said Fernando,
>>
>>I don't think it's possible to separate the two concepts.  Search IS
>>knowledge (one form of it.)   I think we tend to artificially separate
>>the two concepts which is probably a mistake.  It's similar to chess
>>players arguing over tactical vs positional play.  It's more a human
>>concept than anything else.   I fear we do subject our programs to
>>our own superstitions but cannot help ourselves.
>>
>
>Actually I think it would be very interesting to separate the concepts,
>but I don't see that we know how.

I once actually tried to do something like this but
was not successful.  I wanted to define positional
vs tactical chess (even if arbitrarily) but there
were several problems.  For one thing,  even the
values of the pieces should be considered a very
relevant piece of knowledge and already your search
is knowledge based.  I consider a few other terms
as purely material (like various piece cooperation
terms like the bishop pair) but now this is even
more knowledge.  The truth of the matter in my opinion
is that it is futile to separate the two.  If I had
to define PURE tactics I would say that it is only
knowledge of win loss or draw as seen at end nodes in
a search.   A quies search might not even be considered
relevant in this kind of search.

All of the chess concepts we read about in books and
apply to our own chess (and to our programs) are in a
sense purely artificial.  The pieces have no real
fixed values but are estimates of winning chances.
When checkmate happens it's completely irrelevant
who has extra pieces, backward pawns etc.

If you consider a purely tactical program (only knows
wins loss and draw) that is capable of a few hundred
plys of searching, you will see right away that our
chess ideas and concepts are only designed for the
human mind (and now it turns out usefull for chess
program evaluation functions too!),  they have only
abstract meaning.  This deep searches does not need them!

I did plot my programs evaluation against actual win
percentages a couple of years ago and found a very nice
smooth curve.  The score was an excellent predictor of
winning chances.  But as you say, this does not really
tell me how good the evaluation function is.

Here is a useful experiment someone could try:

Plot this graph against thousands of positions with a known
result, for example a large database of master games. Now take
any single term (I'll use isolated pawn on open file with rooks
present for my example) and throw out all positions with this
feature existing.  Make the same plot with this subset of positions.

Now examine the resulting plot compared against the first plot.
I argue that the plot should match (with minor statistical error.)
If they do not match, it's a good indication that your isolated pawns
weight does not match well with your other evaluation.  For instance,
if a score of  0.50 represents an 80% chance of winning, but only
a 60% chance of winning when considering these isolated pawn
positions then it indicates your isolated pawn value is too high
(relative to other evaluation.)  In other words your position is
not as good as your program thinks it is, but the isolated pawn
stuff jacks it up too high.

We have a student who wants to experiment with automated fine tuning
based on this principle.  There are a lot of hairy issues like how
to get a reasonably accurate sample space and other testing
methodology issues but it might turn out interesting.

Your question of how to compare evaluation functions without searching?
Pehaps there is a way to compare degree of error.  An evaluation of
zero or .5 probability all the time will predict win probability
correctly
as you state, but can be very wrong if you choose a sample of positions
that are all wins! It seems that we have to start with a large set of
positions
that are absolutely known to be wins losses or draws.  THEN we can
judge the evaluation function by it's total error.  Some type of fit
needs to be made to the score so that the evaluation functions is
returning
an expenctancy of winning, and not a conventional score.  Then we can
judge it on a
position by position basis for total error.  In this way, simply giving
an even evaluation score on every position should greatly increase it's
error.  Of course how to construct a set like this is not so clear.

- Don


>When we talk about knowledge, we usually mean evaluation. (This is the
>part that the user senses. I know that some people also call "knowledge"
>all kinds of fancy stuff whose purpose is to guide and help the search
>engine. I don't quite agree, but for argument's sake let's stick to
>evaluation). So a "knowledge" program has good evaluation, better than a
>non-knowledge program anyway.
>
>We all know what the BEST evaluation is. It's the one coming out of
>perfect knowledge of the game. But what is good evaluation ? More
>precisely, given two evaluation functions, how do you decide which is
>better ?
>
>I think most people, after a moment's thought would say: Put a search
>engine on top of both and let them play a match (or run a suite, if you
>will). This is not satisfactory, for practical reasons, because some
>engines will do better with certain kinds of evaluations, or will behave
>differently at different time controls, and for theoretical reasons,
>because you would expect that good (or better) evaluation is something
>that can and should be defined without the parasite of searching. When
>you have an "objectively good" evaluation, you would expect that any
>engine would do well with it.
>
>Can anyone come forward with a way of comparing evaluation functions,
>not necessarily a practical one, that does not involve searching ?
>
>Let me make a try at this: The evaluation of a position should be a
>measure of the expected outcome of the game (with assumed perfect play),
>i.e. it can be mapped to a probability of winning. Say, with a score of
>+0.5 you expect to win 65%, and with a score of -4 you expect to win
>0.6%. The probability for winning should be monotonic with the score, or
>else something is bad with the function. So one way to define a better
>evaluation is if it is more monotonic. You can also actually decide on
>score-to-probability mapping with some exponential say, and declare that
>the better evaluation is the one that fits better the mapping.
>
>I think this definition is on the right track, but there is something
>clearly wrong with it: First, there is an almost infinite number of such
>functions that would give you a perfect fit, but most of them are
>nonsense. For example, an evaluation function that always returns 0 is
>perfect in this sense, but obviously useless. A less extreme example is
>an evaluation that limits itself to score from +0.5 to -0.5, but does
>that perfectly. This is a perfect but wishy-washy evaluation, so not
>very useful, because practically you need to know that capturing the
>queen gives you 99.9% win, and this evaluation never guarantees more
>than say 70%. Of course, if you are already a piece ahead, this
>evaluation gives you no guidance at all.
>
>I'm sure this definition can be improved on, but currently I don't know
>how.
>
>Amir
Re: Knowledge again, but what is it? Fernando Villegas 16:11:55 02/25/98
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.