Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: How to evaluate KQ vs KR?

Author: Vasik Rajlich
Date: 07:17:19 05/06/04
On May 06, 2004 at 08:42:35, Tord Romstad wrote:

>On May 06, 2004 at 07:12:37, Vasik Rajlich wrote:
>
>>On May 06, 2004 at 05:24:29, Tord Romstad wrote:
>>
>>>On May 05, 2004 at 14:35:10, Vasik Rajlich wrote:
>>>
>>>>Well, for KQKR I'm sure you can in a few hours come up with something effective,
>>>>even at low search depths. However, for things like KRPKR and KPPKP, tablebases
>>>>are a really nice solution that you won't easily replace.
>>>
>>>It's funny that you bring up KRPKR as an example, because I already have a
>>>specialized KRPKR eval which works reasonably well in practise.  It recognizes
>>>some of the most basic won or drawn positions (including the Philidor and
>>>Lucena positions), and also has some of the heuristic knowledge you will find
>>>in rook endgame books (king on the short side, rook on the long side, using
>>>the rook to cut off the defending king, etc.).  It is nowhere near perfect,
>>>but it is good enough to have saved a huge number of half points against
>>>other engines in my test matches.
>>
>>Do you mean that if you pass just about any KRPKR endgame to your evaluation
>>function, it will return a score of either 0.0 or +/- MAX_SCORE? I think this is
>>what you want (ideally). If you're doing this, then you better be really sure
>>that you're right.
>
>No, it's not nearly as good as that.  It is good enough to win almost all
>won KRPKR positions, and to draw all drawn positions.  It is not generally
>accurate enough to determine whether some particular position is won or drawn,
>though.
>
>>Eventually I hope to have my evaluation do something close to this with more
>>complex endings. Apparently it's not so easy though. Even Shredder and Fritz
>>have the problem of transitioning an advantage into a drawish (or even drawn)
>>ending, while still giving big scores.
>
>Yes, it is very hard, but I hope it is possible to learn something useful
>by trying.
>
>>KPPKP by the way would be next-to-impossible to do with heuristics, although at
>>least this will happen much less in practice than KRPKR.
>
>Handling all KPPKP endgames by heuristics is probably next-to-impossible, but
>I think it should be possible to evaluate a big subset of all such endgames
>by a few simple heuristics.  As a simple example, if the attacking king is
>somewhere in front of the enemy pawn, and the two pawns of the stronger side
>is so far apart that the defending king cannot simultaneously stop both of
>them, the side with two pawns always wins.

This will be really really hard.

For example, does the following position fit your above heuristic?

[D] 8/1p6/8/8/8/7k/P6P/1K6 w - - 0 1

Ok, so maybe we change the heuristic: both pawns must be passed. Still doesn't
work:

[D] 8/8/8/5k2/8/3p1P2/P7/K7 w - - 0 1

Here is another interesting position. White is drawing here. What sort of
heuristics can you come up with, to at least avoid concluding that black wins:

[D] 7K/8/k1P5/7p/8/8/8/8 w - - 0 1

If you try to go further, to general king and pawn endgames, there are all sorts
of funny queening tactics, etc.

>
>A reasonable approach is to begin by classifying KPPKP endgames (or some
>other class of basic endgames) as "win", "draw", "loss" or "don't know".
>Initially the "don't know" class will be very big, but by adding more and
>more rules and special cases you should be able to improve the situation.
>After some time you will have a horribly complicated function with a lot
>of special cases.  Now it might be time to have a close look at the code
>and see whether it is possible to generalize and discover some higer-level
>rules which allow you to simplify your function and eliminate some of the
>special cases.  If you are lucky, you might even discover some principles
>which can be used when evaluating more complex endgames as well.
>
>This is nothing more than philosophy at the moment.  I haven't yet tried
>to use such techniques in practise.
>

Yes, I have done zilch work on the endgame. (In Rybka piece values change
slightly in the endgame, and Rybka knows about some trivial things like opposite
colored bishops & no pawns on both wings.)

However, I had the exact same idea which you give above. Endgame positions which
can be evaluated with more certainty will need less search. The entire endgame
search should probably be quite a bit different than middlegame search. We
already know that null move isn't as good - though still good enough to keep
around. OTOH the concept of progress might be useful. (In the middlegame it's
probably worthless.)

Practically it will probably be six months at least before I start to implement
any of this.

>>>Implementing it took a few days rather than a few hours, though.
>>>
>>>>Note also that 20 elo points is nothing to scoff at.
>>>
>>>Perhaps not, but 20 Elo points is a *very* generous estimate of the
>>>improvement provided by tablebases.  The data I have seen indicate that
>>>the strength improvement is hardly noticable.
>>>
>>>>If you speed up your engine
>>>>by 40%, that's about what you'll gain, and we've seen how far some people will
>>>>go to get this. You're probably at the point with Gothmog where you'll happily
>>>>work for a month to get a ten-point increase.
>>>
>>>A "month" is not a very precisely defined term when it comes to chess
>>>programming.  Like all other amateurs, I don't always have the opportunity
>>>to work equally much.  During some months, I hardly find any time to program
>>>at all, but it also happens that I work 10 or 15 hours per month.  With
>>>something like 15 hours of programming and 20 days of CPU time, a ten-point
>>>increase isn't hard to achieve with an engine at Gothmog's level.  There
>>>are so many pieces of missing knowledge, so many search and eval weights to
>>>tune, and so many stupid bugs to fix.
>>>
>>
>>Hmmm. I'm skeptical, 10 points in fifteen hours is quite a bit.
>
>Please note that I meant 15 hours of *programming* time, but hundreds of
>hours of CPU time.  Because my search and eval is still very badly tuned,
>the probability that a random (but sensible) modification of some search
>parameter increases the playing strength is almost 50%.  It should be
>easy to improve by 10 points in 15 hours (probably much less) of programming
>time simply by experimenting with such changes and playing lots of test
>games.
>
>As an example, last week I tried to divide all search reductions by two
>when the remaining depth is 3 plies or less.  In order to compensate for
>the reduced speed, I increased the number of reductions when the remaining
>depth is big.  After 200 test games, the new version beat the old version
>by 117-83 (+74,-40,=86).  In performance rating, the difference is almost
>60 points in favor of the new version.  According to Rémi Coulom's "whoisbest"
>utility, the probability that the newer version is stronger is 99.92%.
>Whether the new version is stronger against other engines remains to be
>seen, but so far it looks promising.  It won 51.5-48.5 against Yace Paderborn,
>and is currently leading 16.5-9.5 against SmarThink 0.18a r130.
>
>This change was almost completely random.  I had no reason to expect that
>the change would be an improvement, it was simply an experiment which turned
>out to be successful.  Luck rather than skill.
>
>I think all engines near Rybka's and Gothmog's level can easily be
>significantly improved by such simple changes.  There are so many parameters
>to tune, and most of them almost certainly have values which are not even
>close to the optimal.
>

Yes. I find some very interesting things with testing, that are not always
intuitive. Just yesterday I found some interesting stuff about the MTD (f)
driver. If you want I can email you the logs, it's not really for posting.

BTW a few days ago, I ordered two Athlon 64s which will run tests 24/7. Maybe
the way to go is get something like ten of them, and set up some sort of
automated random changes procedure. I'll see how it goes with the two for now.

>>I'm getting much
>>less than that with Rybka. (Although a lot of my work is still on infrastructure
>>and testing procedures, so it's hard to give a fair #.)
>
>That's the important point, of course.  If we concentrated exclusively on
>short-term gains in playing strength, our engines would improve much more
>rapidly.
>
>>BTW I've been including
>>Gothmog in my gauntlets, and against Rybka it scores slightly better than Crafty
>>and Pepito (but not as well as Fritz 5.32 or Ruffian).
>
>Interesting, but do you run the games with or without pondering?  Crafty
>is seriously crippled when running without pondering, as Bob has explained
>many times.  If I recall correctly, the problem is that the time managment
>doesn't work as intended when pondering is disabled.

Without pondering. Actually I guess I don't care if Crafty is slightly crippled,
as long as it's consistently slightly crippled. It could explain the difference
though.

>
>For this reason, I very rarely test against Crafty.
>
>>I think it's for the same
>>reason that Shredder 8 scores about 100 points better than Fritz 8 - Rybka's
>>search is better than its eval, so it's more vulnerable to king attacks.
>
>I also see inconsistent results against different opponents.  Gothmog loses
>by crushing margins against Amateur and The Baron, but is not far below
>50% against Pepito.
>
>I think Gothmog's search is also better than its eval, by the way.
>
>>>>Re. the aesthetics, you can also think of tablebases as far more general than
>>>>any heuristics you'll come up with to replace them. The tablebase code knows
>>>>only about mates and draws, and its search figures out the rest. It's just an
>>>>implementation detail that you precompute the results. :-)
>>>
>>>It is not entirely accurate to say that I precompute the results.  The
>>>results have been precomputed by Nalimov's code, which I don't even
>>>understand myself.  This is another reason why I prefer not to use TBs.
>>>
>>
>>Well, ok, aesthetics are aesthetics. I could argue that you happily use a
>>compiler, which also does all sorts of things you don't understand. :-)
>
>Yes.  Aesthetics are aesthetics, as you say, and they often don't make
>much sense from a rational point of view.  :-)
>

Well, I remember an old philosophy class I took where the professor insisted
that aesthetics are objective, ie that one work of art could be considered
better than another, etc. Of course maybe he knew that saying stuff like this
was the only way we'd pay attention :-)

Vas

>Tord
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.