Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Results of 112 engines in test suite "WM-Test" (100 pos) for download

Author: Vincent Diepeveen
Date: 11:40:28 08/17/02
On August 17, 2002 at 09:53:25, Uri Blass wrote:

both have valid points, but the main thing is not real clear to
most of the people and that this testset is very weird in this
that the worst positional middlegame program - Fritz 7, is number
1 at the middlegame positions and that the worst endgame program,
Gambit Tiger 1.0, is having the highest score on endgame problems.

So the testset is a bit funny.

The reasons for this is because all the vaste majority of
positoins are requiring 'patzermoves'. In case of middlegames the
vaste majority of positions require aggressive handling of the
king safety of the opponent. Something Fritz is doing well.

In case of endgames, huge scores for obvious advantages solves anything
there. See why Tiger doing so well there. Just don't care about that pawn
and promote your passer! Give 3 pawns bonus for pawns on the 3d and even
more for on the 2nd rank.

Things like that. The testset is measuring the aggressiveness of engines.
It's not indicating how strong engines are at all.

For example the number of queenside positions is very little. You can
count them on 1 hand and even then those positions require a patzer
move.

The other problem, that there are only 'best moves' to find and no
'avoid moves', is a thing which is also true. That's nevertheless not
such a major problem as the patzer problem.

How can an engine that doesn't know a thing from bishop versus knight,
when compared to the other top engines, be on #1 at the positional testset?

These guys make of course the same mistakes like others.
We have seen this before of course. The GS2930 testset is a good example
of another testset where just giving away a pawn for 2 checks is giving
a high score for the engine in question.

That's really showing a problem in all these testsets. In that respect
even the hardest work is useless of course.

This testset when i saw it, i was happy that it had new positions, but
the claims of the authors is not realistic.

The testset measures how well aggressive the programs are at the positions
in the testset. They measure nothing that has to do with the real strength
of engines.

>On August 17, 2002 at 08:37:13, Albert Silver wrote:
>
>>On August 17, 2002 at 05:11:01, Uri Blass wrote:
>>
>>>On August 16, 2002 at 17:39:48, Manfred Meiler wrote:
>>>
>>><snipped>
>>>>For all the other mentioned engines it seems to me that this test suite is too
>>>>hard for them.
>>>>The WM-Test was designed by the test authors Gurevich/Schumacher for engines
>>>>with playing strength in a range of 2500 - 2800 ELO (look at the attached readme
>>>>file) - in this point of view (ELO >=2500) I shouldn't have tested many of these
>>>>112 engines... :-)
>>>
>>>Did they check carefully that the test is correct?
>>>
>>>When I look at the pgn it seems that lines are not  convincing
>>>for computer programs.
>>>
>>>For example in position 3 they give the line
>>>1.Nf5 gxf5 2.gxf5 Nc7 3.Rg1 Ne8 and no word about the typical computer
>>>move Rf7.
>>>
>>>After Ne8 yace can see immediatly small advantage for white
>>>Bxh6 0.54/10,0.65/11,0.54/12
>>>but after Rf7 it says Rg6 -0.37/10,Rxg7+ -0.28/10,
>>>Rxg7+ -0.47/11,Rg6 -0.30/11,Rg2 -0.21/11
>>>Rg2 -0.22/12
>>>
>>>Maybe it can see advantage for white after long analysis
>>>(I did not try it) but I think that it is better to give some
>>>tree to convince programs that the moves are correct(At least in part
>>>of the cases I cannot use the pgn together with yace's learning to prove that
>>>the solutions are winning moves in a reasonable time and I often
>>>cannot even convince it that the move to find is the best move).
>>>
>>>Uri
>>
>>I gave similar questions to one of the actual authors, Gurevich, in the
>>Chessbase forum, but with little effect. I showed him that several engines could
>>reach the correct move, and keep it, for reasons that had nothing to do with the
>>solution. For example, in one position IIRC, the key _winning_ move was Re3, yet
>>several engines chose it because they thought it led to a perpetual check and
>>considered the playing side to be slightly inferior. I thought that although one
>>can argue the move was played, that if it is to show an engine's competence in
>>finding _winnning_ moves, it should of course be clear that is why it played it.
>>I argued merely that in such a case, the position should not be a part of the
>>suite, since it wasn't clear the engine would actually win the game. There was
>>at least one other case where an engine chose the winning move for reasons that
>>had nothing whatsoever to do with the winning line, and the line the engine gave
>>showed it had no understanding as to why the move was great. In other words it
>>might play this wonderful move, a key to winning the position, but still not win
>>the game. Again, in my eyes this meant that it was poorly suited to be a part of
>>a test suite. I still thanked him for the wonderful effort. The author was very
>>courteous, but treated my arguments as clever rhetoric, and ignored them.
>>
>>                                        Albert
>
>
>I consider wrong solutions as a bigger problem than a solution
>for the wrong reason and the first question is if all the solutions are
>right(they may be right but the authors often did not give a convincing
>evidence for it by a tree)
>
>I can understand that there may be a case when it is practically
>impossible to prove the solution by a tree without the
>right knowledge because the tree
>that is needed is too big if the program has not
>the right knowledge(for example if the program does not know that
>KBP vs KPP is a draw when the stronger side has the wrong bishop)
>but I expect to see in the pgn
>at least some games against the top programs in cases that
>the result is not obvious for humans.
>
>In the third position there is no convincing line to convince
>humans or computers that white is winning and they give no analysis after
>1.Nf5 gxf5 2.gxf5 Nc7 3.Rg1 Rf7
>
>It is clear that white has a strong attack but it is
>not a position when humans can be sure without analysis that
>white is winning.
>
>Computers may find that white is winning after hours(I did not try)
>
>Here is the third position before 1.Nf5
>
>[D]3r1r2/pp1q2bk/2n1nppp/2p5/3pP1P1/P2P1NNQ/1PPB3P/1R3R1K w - -
>
>Uri
Re: Results of 112 engines in test suite "WM-Test" (100 pos) for download Uri Blass 13:29:47 08/17/02
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.