Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Rybka's current exe size: 4 628 480 !

Author: Vasik Rajlich

Date: 14:30:51 01/30/06

Go up one level in this thread


On January 30, 2006 at 13:19:25, Robert Hyatt wrote:

>On January 29, 2006 at 19:41:16, Uri Blass wrote:
>
>>On January 29, 2006 at 19:29:30, Vasik Rajlich wrote:
>>
>>>On January 29, 2006 at 12:07:52, Albert Silver wrote:
>>>
>>>>On January 29, 2006 at 11:55:59, Uri Blass wrote:
>>>>
>>>>>On January 29, 2006 at 10:03:02, Albert Silver wrote:
>>>>>
>>>>>>On January 29, 2006 at 07:12:15, enrico carrisco wrote:
>>>>>>
>>>>>>>Reminds me of Deep Thought -- using the hardware for the last N plies.  This
>>>>>>>type of tactical search works real efficiently to see danger from your opponent
>>>>>>>but less efficient in finding chances for itself (ex: Genius.)  Tactically it
>>>>>>>makes it very strong but not so efficient in king attacks compared to Fritz or
>>>>>>>Hiarcs.  Hence, on test positions it does slightly worse (just like Fruit.)
>>>>>>
>>>>>>Would that really be the reason? As you probably know, one can significantly
>>>>>>improve its ability with test suites, by simply increasing the 'Optimism' in the
>>>>>>outlook.
>>>>>>
>>>>>>                                           Albert
>>>>>
>>>>>Only on test suites that you need to fail high to find the move and not in test
>>>>>suite that you need to fail low.
>>>>>
>>>>>I think that a poosible test to test positional understanding is the following
>>>>>test:
>>>>>
>>>>>1)Use unequal time control so the result of both programs is 50%
>>>>>2)Take all the games when there is disagreement between the programs about the
>>>>>question which side is better(both programs evaluates the position as at least
>>>>>0.25 pawns advantage for itself for at least 3 consecutive moves).
>>>>>
>>>>>3)calculate the result in the relevant games
>>>>>
>>>>>The program that score better in the games probably has a better positional
>>>>>understanding.
>>>>>
>>>>>Uri
>>>
>>>There is one issue.
>>>
>>>Let's say that I change Rybka's eval to return eval () + 200 centipawns. Rybka
>>>will then get butchered in this test, but the overall program level would be
>>>preserved and (I would argue) the positional level would be preserved as well.
>>>
>>>In other words, is an evaluation responsible for absolute accuracy, or accuracy
>>>relative to other likely positions within the same search?
>>
>>This is a problem so I think that programs should have symmetric mode so that
>>the evaluation is symmetric and dependent only on the history of the game.
>>
>>Maybe something else may be slightly better for playing strength(one of my ideas
>>that I do not use today and even did not try is to change slightly the
>>evaluation during the game) but at least there should be a personality that use
>>symmetric evaluation that can be based on previous moves of the game but not on
>>other factors.
>>
>>Uri
>
>
>You can be 100% symmetric but still have a "happy" or "pessimistic" evaluation.
>Crafty is now 100% symmetric in fact, because I wanted to see if I can tune it
>to work satisfactorily that way.  It would not be uncommon for the side on move
>at the root to always have a slight positional edge, since they get to move
>first, and the opponent mainly reacts to that move.  There have been lots of
>examples of odd/even problems, where an odd ply search, since the root side gets
>one extra move, produces a score that is biased for the root side on move.  Then
>the even ply searches bring this back down.  But both are correct scores for the
>tree being searched, as one ply can always change the score.

Let me elaborate on this a bit as I also put some thought into it.

Any time you are "too optimistic" or "too pessimistic" in any position, you move
your evaluation away from perfect, and weaken the program. It's just that some
types of inaccuracies are much easier to live with.

For example, let's say that your program always evaluates a hedgehog-type
middlegame position as very good for white. (Many programs do this.) You can
still often play decent chess, because the pawn structure isn't changing, while
your other terms cause your program to favor good moves within that pawn
structure. The optimism or pessimism isn't really hurting, except in the fairly
rare case where you have a chance to get out of the pawn structure entirely.

This is an example of an evaluation error which doesn't really hurt positional
play _that_ much.

The evaluation errors that will really kill you are related to terms that
frequently change within a single search.

Uri - this is the main problem with your test. It will over-emphasize evaluation
terms which don't frequently change within a single search.

Not really sure if I'm making myself clear, but I am very sure of what I'm
trying to say. :)

One more thought: another way to measure the positional strength of an engine is
to measure the tactical strength, and subtract it (somehow) from the full
strength. This might be an easier way to approach the problem, although it's not
clear how to measure tactical strength either. (Testsuites are certainly not the
way.)

Vas



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.