Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Results of 112 engines in test suite "WM-Test" (100 pos) for download

Author: Uri Blass
Date: 13:29:47 08/17/02
On August 17, 2002 at 14:40:28, Vincent Diepeveen wrote:

>On August 17, 2002 at 09:53:25, Uri Blass wrote:
>
>both have valid points, but the main thing is not real clear to
>most of the people and that this testset is very weird in this
>that the worst positional middlegame program - Fritz 7, is number
>1 at the middlegame positions and that the worst endgame program,
>Gambit Tiger 1.0, is having the highest score on endgame problems.
>
>So the testset is a bit funny.
>
>The reasons for this is because all the vaste majority of
>positoins are requiring 'patzermoves'. In case of middlegames the
>vaste majority of positions require aggressive handling of the
>king safety of the opponent. Something Fritz is doing well.
>
>In case of endgames, huge scores for obvious advantages solves anything
>there. See why Tiger doing so well there. Just don't care about that pawn
>and promote your passer! Give 3 pawns bonus for pawns on the 3d and even
>more for on the 2nd rank.
>
>Things like that. The testset is measuring the aggressiveness of engines.
>It's not indicating how strong engines are at all.
>
>For example the number of queenside positions is very little. You can
>count them on 1 hand and even then those positions require a patzer
>move.
>
>The other problem, that there are only 'best moves' to find and no
>'avoid moves', is a thing which is also true. That's nevertheless not
>such a major problem as the patzer problem.
>
>How can an engine that doesn't know a thing from bishop versus knight,
>when compared to the other top engines, be on #1 at the positional testset?
>
>These guys make of course the same mistakes like others.
>We have seen this before of course. The GS2930 testset is a good example
>of another testset where just giving away a pawn for 2 checks is giving
>a high score for the engine in question.
>
>That's really showing a problem in all these testsets. In that respect
>even the hardest work is useless of course.
>
>This testset when i saw it, i was happy that it had new positions, but
>the claims of the authors is not realistic.
>
>The testset measures how well aggressive the programs are at the positions
>in the testset. They measure nothing that has to do with the real strength
>of engines.

I prefer a test suite that is simply based on mistakes
of computers in comp-comp games.

No knowledge in chess is needed to compose the test.

People can take yace and search for
tactical mistakes in comp-comp games by computer analysis.

I suggest to use yace for the analysis because it is not
a root processor and can learn from previous search.

people can give it to analyze for one hour every interesting
position(not more than 2 pawns for one side after a short search)
and look for positions when the change in the evaluation
suggests that the move is a tactical mistake)

They can later check it by deep analysis with other programs
and if other programs also agree that the move was a mistake the position can be
in the test suite.

I can give one example from a game of movei
Bestia0.88-Movei(3th division)
r6k/5r1p/1p2pP2/3p1p2/p1bB1P2/4P3/PP4RP/R5K1 b - - 0 25 am b5

Movei played here b5 and lost the game.

Black's position is not good
but b5 is losing immediatly after Kf2 when I believe that
things are less simple after moves like Rg8

Uri
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.