Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: WM test bugs

Author: Vincent Diepeveen
Date: 08:41:02 06/29/02
On June 29, 2002 at 11:02:34, Manfred Meiler wrote:

I asked the makers a few questions.

To sum up some points:

 a) why is tiger #1 at the endgame testsuite, because
    tiger1 is the worst endgame program among the commercials,
    in contradiction to tiger2 which seemingly is doing somewhat
    better there.
 b) why is fritz ranked #1 at the positional test positions, despite
    that it is one of the weakest positional programs within the
    commercial range, because of a lack of knowledge
 c) i didn't question the patzer moves for the king safety testset,
    it is known fritz always goes to the king of the opponent
 d) the testsuite doesn't hav epositions to avoid playing patzermoves,
    instead it only gives bonuses for making patzermoves.
 e) 50% of the games get decided on the left, the other 50% the right.
    the left is called queenside. Out of 100 positions or whatever
    perhaps a few have to do with queenside and even then it's about
    big patzer moves. Not positional moves

The only answers i got was: "well it is the best program so who cares".
Reality is that *any* analysis they give about a position is done with
a program which forward prunes like hell positional moves (fritz). I
measure it pruning up to 7 ply positoinal lines forward. Rook manoeuvres,
knight manoeuvres that do *not* go in the direction of the king,
all kind of positional moves it is missing.

Also i asked some analysis *never* the guys have ever shown a
line which they didn't analyze with fritz. These guys *only*
analyze with fritz7.

The only good thing about the testset is that they did effort to
collect some positions from positions as played by the ex world champs,
usually new testsets are positions i have already seen 100 times.

Yet the reality is that the accuracy of their analysis is based upon fritz7,
this is the *main* problem. Of course never having played active chess
in the past years, it's hard for someone who isn't a titled active playing
player, to get an objective analysis of a position, but just analyzing
a position for correctness with fritz7, that's obviously colouring a
testset.

Note that the testset is called 'css' WMtestset. Talking about attaching
a name. I remember that chessbase started to sell not so long ago a
program which is 4 folded world champion (1997,1999,2000,2001).
My german isn't that bad, but "nearly as good as fritz" was the
highest honour they give a program *ever*.

Lacking world titles, despite very good preparement from their side,
obviously they need other means to push their program. Amazingly a few
amateurs who know very little from what chess is, are serving them
well.

If you call your thing 'wmtest set', then you should also include positions
where a program must AVOID moves. Basically 99% of the testset should be
like that.

Let me give you a simple example of a position where you
have a simple BM:  1.d4,d5 2.d4,e5 3.dxe5 then let engines
search after that.

The BM here is simply d5-d4

Amazingly not so simple for chess programs.

Completely different are patzer positions where giving away a pawn
for an open file against the opponent king is measuring something we
already have seen too much in other testsets.

I remember GS2930, a program that's giving away pawns there within a few
seconds a move, is obviously positional not so strong. It says something
about how AGGRESSIVE a program is, but nothing about how GOOD a program is.

Biggest criticism i have at such testsets which claim to give a good
estimation how good a program is, is the question: "why is shredder6 so
low at a few crucial points which the testset is supposed to adress?
Isn't it 4 folded world champion?"

>On June 29, 2002 at 09:40:12, Vincent Diepeveen wrote:
>
>>On June 28, 2002 at 18:38:57, Peter Berger wrote:
>>
>>now try the reigning world champion 2001 deepjunior7 at your pc
>>and give it 12 hours time to analyze, post the score and post
>>the mainline.
>>
>>This instead of the program for which the testset has been
>>made, namely Fritz.
>>
>>Best regards,
>>Vincent
>
>
>Hello Vincent,
>
>at first: my english is not so good than yours :-(
>
>Your last sentence is complete nonsense, sorry !
>Have you any proofs for your accusation ?
>
>Why do you think that the "WM-Test" is made especially for Fritz:
>
>a) because this test suite is published by "CSS" ?
>The "WM-Test" was designed by Dr. Michael Gurevich and Heinz-Josef Schumacher,
>the biggest part of it  b e f o r e  publishing in CSS - and of course without
>any "pre-order" of CSS ore someone else.
>I do know that because I'm a part of the "WM-Test team" responsible for testing
>different engines. In any case of "pre-order" I wouldn't have put ONE hour in
>such a project, believe it or not !
>BTW: H.J. Schumacher (together with Hubert Bednorz) designed the earlier test
>suite "BS-2830". Do you think the BS-2830 is also a "Fritz test" because he was
>also published in "CSS" (in 1997) ?
>
>b) because of my EXCEL sheet that I've sent you some weeks ago - with the
>results of over 80 engines, different Fritz 7 versions on top ?
>Now I do regret that I gave you my test results of "WM-Test".
>If someone is honestly interessed on my test results of 74 engines (tested on
>AMD Athlon Thunderbird 1400 mhz) please have a look at
>http://www.computerschach.de/test/index.htm
>
>It's not very funny for me
>- to work many hundreds of hours for a new test suite (together with Michael and
>Heinz)
>- then to give my results for free to everybody who's interessed
>- and then to read such an insulting post like yours.
>
>Such things are suited to steal me the fun for giving the results of my test
>efforts for free.
>It's disappointing...
>
>Manfred
Re: WM test bugs Uri Blass 09:20:12 06/29/02
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.