Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Why "positional" test positions are physically impossible! WM Test

Author: Robert Hyatt

Date: 14:20:45 12/03/02

Go up one level in this thread


On December 03, 2002 at 16:33:48, Uri Blass wrote:

>On December 03, 2002 at 15:52:33, Robert Hyatt wrote:
>
>>On December 03, 2002 at 13:41:24, Uri Blass wrote:
>>
>>>On December 03, 2002 at 12:54:24, Rolf Tueschen wrote:
>>>
>>>>Until now nobody out of the programmer group had ever spoken about that evident
>>>>truth. SMK says that these tests can't show the strength of play or as it was
>>>>claimed for this test, the "ability to analyse". SMK also explained (for the
>>>>first time in that direct speech) how he and every programmer could fake the
>>>>results of such tests. He then speaks about the question if it could be
>>>>discovered, as it was by T. Mally in case of Ed Schröder, and he saud that of
>>>>course he could do it so that nobody could find out. In fact he had written such
>>>>a "tool", but in the end he decided to let it out of the commercial product.
>>>>
>>>>But all this gives me the opportunity to talk about the reasons why such a
>>>>testing with even these top class positions is nonsense. And why it has nothing
>>>>to do, well, almost nothing, with _real_ strength.
>>>>
>>>>I think I can show you why in special for those allegedly positional positions
>>>>the test is nonsense and that he's measuring something else, but not analysing
>>>>power of the engine.
>>>>
>>>>I will keep it very short so that you can do your own research.
>>>>
>>>>(Just to mention that I asked for that problem already two years ago as
>>>>'Schachfan' in CSS forum, but then it went about a tactical mate position).
>>>>
>>>>Look, if you have a positional game of chess, where do you choose the point for
>>>>a test? Of course, in this WM-Test of Gurevich et al you take the position when
>>>>exactly a certain by the experts well commented and mostly beautiful move has
>>>>been made. Because there the commentators said: only with this move he could
>>>>conservate the slight advantage.But the truth is that often the engines see - in
>>>>their actually possible realm - two solutions very closely together. And in
>>>>general it could be said that for positional positions without tactics the evals
>>>>are not very impressive at all. So, how could you calculate it in your results?
>>>>Would you really take a difference of 0.01 points as decisive? Is that relevant?
>>>>
>>>>But the main problem of such test positions is this.
>>>>
>>>>The point of that "nice move" (that caught th attention of the commentators) is
>>>>by no means the most important moment for the decision making. Let me explain
>>>>the irony. The usual commentators are masters themselves. Well, and therefore
>>>>they take certain decisions as completely normal, because they are easy and
>>>>trivial for _them_, but not so for the amateurs. Or the machines so to speak.
>>>>But now go with me bachwards a few moves. How optimistic you are that we could
>>>>then expect that a machine would be better prepared to make the right decision
>>>>in such _positional_ games? And that is exactly the point for these test
>>>>positions. _Realistically_ we had to test the machines in positions, where only
>>>>experienced humans know how to play to be later in the position to make some
>>>>"decisive" moves, moves then commented by our experts. Only the early positions
>>>>would allow a verdict if our actual machines could play posiional chess. We know
>>>>already the answer. They can't for the moment.
>>>>
>>>>But therefore such tests with such a great pretension are a fake, a hoax in
>>>>themselves. And Stefan MK explained it with the possible distinction. In reality
>>>>M. Gurevich is making a question of life or death out of it. But earlier
>>>>somewhere I already mentioned that it's ridiculous to claim the honor for a so
>>>>called, guess that, I translate, World Champion Test. These positions are simply
>>>>taken from Wch matches. What a thrill! But it's known for ages that the chess of
>>>>these matches is not always the best possible. Because it's mainly a
>>>>psychological fight. And fortunately Gurevich didn't claim that he were testing
>>>>psychology. But just now it was published that one position wasn't from Wch
>>>>chess at all. A game between Anand and Shirov. And to make the scandal even
>>>>greater. The authors used a false position. Instead the K  stood on c7, they put
>>>>him on d7. But with Kc7 we have two solutions. The searched Ng5 and now the odd
>>>>Bg5 too. Christ! A whole life work of a few hours of choosing some positions out
>>>>of Wch games is in danger to lose all reputation. Doctor doctor, gimmi the
>>>>news...!
>>>>
>>>>Rolf Tueschen
>>>>
>>>>On December 03, 2002 at 09:26:42, Eduard Nemeth wrote:
>>>>
>>>>>Very interesting post from SMK in CSS Forum (only german).
>>>>>
>>>>>Please read it, i thing that a translation is interesting for You!
>>>>>
>>>>>Read here:
>>>>>
>>>>>http://f23.parsimony.net/forum50826/messages/54995.htm
>>>
>>>positional test suites are not impossible.
>>>
>>>I think that the known test suites are not good for that purpose and I also
>>>believe that it is not easy to build them so I prefer tactical test suites.
>>>I believe that there is a lot of room for improvement in tactics.
>>>
>>>positional test suites should not be always positions that are hard for humans
>>>and they may include also positions that are easy for humans but hard for part
>>>of the computers.
>>>
>>>
>>>A possible way to build them may be to analyze a lot of games of computers from
>>>the ssdf games and find the positional mistakes that were done by the programs
>>>and the target can be to avoid the mistakes.
>>
>>
>>
>>you miss the point.  For a tactical position, it is easy to show that the
>>winning tactical
>>idea is correct and winning beyond a doubt.  For a positional test position, the
>>program
>>can make the right move for the right reason, or it might make it for the wrong
>>reason,
>>but both get the same score.  About the only way to do this is to create
>>positions where
>>there are attractive (but wrong) moves that could be played, and see if the
>>program plays
>>them.  If it plays a bad move, it clearly doesn't understand the issue.  If it
>>plays the right
>>move, you only know that it doesn't appear to not understand things, but it also
>>could
>>just be lucky.
>>
>>>
>>>The main problem is to agree about the positional mistakes.
>>>
>>>There are a lot of cases when computers can translate positional advantage that
>>>they do not understand to positional advantage that they understand so if most
>>>of the programs agree after a long search that the move is correct then it is
>>>going to be an evidence that the move is correct.
>>>
>>>You can still define the move as positional move because the tactic that
>>>computers see after a long search is not tactics of winning material but winning
>>>better pawn structure or better mobility.
>>
>>But a program without mobility analysis can still make the right move for the
>>wrong
>>reason so the test will be worthless...
>
>It may happen in one position but if the test have enough positions from
>a lot of games then I do not expect a program that knows nothing to be always
>lucky.

No, but how can you use the resultng data to compare your program with
another?  He might have more knowledge, but get a worse score, because a
few of your positional terms happen to turn you in the right direction for
a specific test.

Or he knows more, but he gets the same score.  The results don't show what
is different between the two engines.  A program needs to make a move for
the right reason in order to claim to "get it right".  Even in tactical
positions this often doesn't happen.

>
>I think that the right positional test may be productive to test changes in
>program's evaluation but the test is not enough and programmers should do more
>tests.
>
>The side who does more mistake is not the side that is losing in chess.
>The importance of one big mistake may be bigger than the importance of 2 small
>misakes.
>
>It is possible that a change in the evaluation is producing less mistakes but
>does not do the program better because the new big mistakes are more important
>than the old small mistakes.
>
>Uri



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.