Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Evaluation comparative test for Amateur Engines (PROPOSAL)

Author: Gerd Isenberg

Date: 00:17:50 02/21/04

Go up one level in this thread


On February 20, 2004 at 19:08:13, Jaime Benito de Valle Ruiz wrote:

>>It is interesting.
>>
>>But does comparing final eval-scores make sense, even with a normalized range?
>>Programs, i mean search and eval are designed too differently.
>>There is no or only vague information (piece interaction) about all the zillions
>>of aspects/pattern and weights of the evaluation, only a final eval result.
>>
>>Eval scores of complex positions with a lot of important aspects are difficult
>>to interprete. Eg. kingsafety issues for both sides, passers, "active piece
>>play", static/dynamic pawn structures, unbalanced material, weak/strong squares
>>and their interactions ...
>>
>>I prefere to discuss some eval aspects, interactions and vague implementation
>>hints with concrete positions from time to time...
>>
>>Programmer don't like to share all their eval tricks for obvious reasons ...
>>
>>Cheers,
>>Gerd
>
>You're right.
>
>I'm not asking for everyone to give out their "tricks", but to give some
>figures, and help everyone else a "stand-pat" for their eval. If your engine
>gave +2.00 in a situation, but all the strongest commercial engines gave
>something around -1.00 for the same position... wouldn't you at least feel
>interested about why the difference is so great?

Of course i am ;-)
That was the interesting thing with Tord's position - even without queens.

>My idea is not to give an
>ultimate score for a list of positions, but to find an automated way to find
>"strange" disagreements in scores, and give you a chance to tweak your engine
>more efficiently. I think that if we all contribute, we could easily come up
>with a fairly big database of positions where lots of programmers can spot
>serious flaws in specific positions for their evaluation functions.
>
>If there's a serious disagreement in any particular position, an interesting
>thread can be started about it, no doubt.
>
>Non-profit contributions, and suggestions are more than welcome in this respect.
>My engine is still far far away from most of the ones here, and I'd really pay
>to have information such as this available to test my engine. I'm sure that
>other people with much much better engines than mine wouldn't mind to give it a
>try, just in case.
>
>Maybe we could refine the test, so independant pieces of evaluation (such as
>pawn structure, king's safety, etc...) can be included separately.
>
>I'm talking about improving our resources, not giving out secrets.
>
>Please give any suggestions you think are relevant; I'm hoping to learn from
>you... as well as contributing in the future, if I can.
>
>Regards,
>
>  Jaime


My point with isolated positions is that you can't conclude on the "quality" of
an eval just by looking on final scores. How general and differentiated is the
knowledge of an engine? Is that long or short range knowledge, are some features
lost forever or only temporary?

So may be some set of similar positions with one main aspect and recombined
interactions like making a blocked pawn mobil or different material is fine. Or
some successive positions along a rather forced line...

Cheers,
Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.