Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: How to evaluate KQ vs KR?

Author: Tord Romstad

Date: 06:16:02 05/06/04

Go up one level in this thread


On May 06, 2004 at 09:09:44, Uri Blass wrote:

>On May 06, 2004 at 08:42:35, Tord Romstad wrote:
>
>>On May 06, 2004 at 07:12:37, Vasik Rajlich wrote:
>>
>>>On May 06, 2004 at 05:24:29, Tord Romstad wrote:
>>>
>>>>On May 05, 2004 at 14:35:10, Vasik Rajlich wrote:
>>>>
>>>>>Well, for KQKR I'm sure you can in a few hours come up with something effective,
>>>>>even at low search depths. However, for things like KRPKR and KPPKP, tablebases
>>>>>are a really nice solution that you won't easily replace.
>>>>
>>>>It's funny that you bring up KRPKR as an example, because I already have a
>>>>specialized KRPKR eval which works reasonably well in practise.  It recognizes
>>>>some of the most basic won or drawn positions (including the Philidor and
>>>>Lucena positions), and also has some of the heuristic knowledge you will find
>>>>in rook endgame books (king on the short side, rook on the long side, using
>>>>the rook to cut off the defending king, etc.).  It is nowhere near perfect,
>>>>but it is good enough to have saved a huge number of half points against
>>>>other engines in my test matches.
>>>
>>>Do you mean that if you pass just about any KRPKR endgame to your evaluation
>>>function, it will return a score of either 0.0 or +/- MAX_SCORE? I think this is
>>>what you want (ideally). If you're doing this, then you better be really sure
>>>that you're right.
>>
>>No, it's not nearly as good as that.  It is good enough to win almost all
>>won KRPKR positions, and to draw all drawn positions.  It is not generally
>>accurate enough to determine whether some particular position is won or drawn,
>>though.
>>
>>>Eventually I hope to have my evaluation do something close to this with more
>>>complex endings. Apparently it's not so easy though. Even Shredder and Fritz
>>>have the problem of transitioning an advantage into a drawish (or even drawn)
>>>ending, while still giving big scores.
>>
>>Yes, it is very hard, but I hope it is possible to learn something useful
>>by trying.
>>
>>>KPPKP by the way would be next-to-impossible to do with heuristics, although at
>>>least this will happen much less in practice than KRPKR.
>>
>>Handling all KPPKP endgames by heuristics is probably next-to-impossible, but
>>I think it should be possible to evaluate a big subset of all such endgames
>>by a few simple heuristics.  As a simple example, if the attacking king is
>>somewhere in front of the enemy pawn, and the two pawns of the stronger side
>>is so far apart that the defending king cannot simultaneously stop both of
>>them, the side with two pawns always wins.
>>
>>A reasonable approach is to begin by classifying KPPKP endgames (or some
>>other class of basic endgames) as "win", "draw", "loss" or "don't know".
>>Initially the "don't know" class will be very big, but by adding more and
>>more rules and special cases you should be able to improve the situation.
>>After some time you will have a horribly complicated function with a lot
>>of special cases.  Now it might be time to have a close look at the code
>>and see whether it is possible to generalize and discover some higer-level
>>rules which allow you to simplify your function and eliminate some of the
>>special cases.  If you are lucky, you might even discover some principles
>>which can be used when evaluating more complex endgames as well.
>>
>>This is nothing more than philosophy at the moment.  I haven't yet tried
>>to use such techniques in practise.
>>
>>>>Implementing it took a few days rather than a few hours, though.
>>>>
>>>>>Note also that 20 elo points is nothing to scoff at.
>>>>
>>>>Perhaps not, but 20 Elo points is a *very* generous estimate of the
>>>>improvement provided by tablebases.  The data I have seen indicate that
>>>>the strength improvement is hardly noticable.
>>>>
>>>>>If you speed up your engine
>>>>>by 40%, that's about what you'll gain, and we've seen how far some people will
>>>>>go to get this. You're probably at the point with Gothmog where you'll happily
>>>>>work for a month to get a ten-point increase.
>>>>
>>>>A "month" is not a very precisely defined term when it comes to chess
>>>>programming.  Like all other amateurs, I don't always have the opportunity
>>>>to work equally much.  During some months, I hardly find any time to program
>>>>at all, but it also happens that I work 10 or 15 hours per month.  With
>>>>something like 15 hours of programming and 20 days of CPU time, a ten-point
>>>>increase isn't hard to achieve with an engine at Gothmog's level.  There
>>>>are so many pieces of missing knowledge, so many search and eval weights to
>>>>tune, and so many stupid bugs to fix.
>>>>
>>>
>>>Hmmm. I'm skeptical, 10 points in fifteen hours is quite a bit.
>>
>>Please note that I meant 15 hours of *programming* time, but hundreds of
>>hours of CPU time.  Because my search and eval is still very badly tuned,
>>the probability that a random (but sensible) modification of some search
>>parameter increases the playing strength is almost 50%.  It should be
>>easy to improve by 10 points in 15 hours (probably much less) of programming
>>time simply by experimenting with such changes and playing lots of test
>>games.
>>
>>As an example, last week I tried to divide all search reductions by two
>>when the remaining depth is 3 plies or less.  In order to compensate for
>>the reduced speed, I increased the number of reductions when the remaining
>>depth is big.  After 200 test games, the new version beat the old version
>>by 117-83 (+74,-40,=86).  In performance rating, the difference is almost
>>60 points in favor of the new version.  According to Rémi Coulom's "whoisbest"
>>utility, the probability that the newer version is stronger is 99.92%.
>>Whether the new version is stronger against other engines remains to be
>>seen, but so far it looks promising.  It won 51.5-48.5 against Yace Paderborn,
>>and is currently leading 16.5-9.5 against SmarThink 0.18a r130.
>>
>>This change was almost completely random.  I had no reason to expect that
>>the change would be an improvement, it was simply an experiment which turned
>>out to be successful.  Luck rather than skill.
>>
>>I think all engines near Rybka's and Gothmog's level can easily be
>>significantly improved by such simple changes.
>
>I guess that Crafty also can be easily significantly improved by such simple
>changes and probably the same for Fritz and Shredder.

Perhaps, but the difference is that it is much harder to find the right
changes in top engines like Fritz and Shredder, for the simple reason
that they are much better tuned.  In Gothmog, as I said, the probability
that a random (but sensible) change to some search parameter results in
an improvement is close to 50%.  In Shredder, the probability is probably
close to (but of course not equal to) 0%.

Tord



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.