Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: WM Test Position 1 - ENGLISH Testvalidity of "WM-Test" Part II

Author: Rolf Tueschen

Date: 13:41:29 06/10/04

Go up one level in this thread


On June 10, 2004 at 08:49:21, Mike S. wrote:

>On June 09, 2004 at 19:05:40, Rolf Tueschen wrote:
>
>>(...)
>
>>Now Hagra, an anonymous author with a good knowledge of statistics and chess has
>>made the strongest critic against the test that I know of. He basically doubts
>>that a chess position from real life chess can test a machine because it is
>>difficult to decide why the machine has adopted a specific continuation.
>
>Actually this would be a critizism of *all* test suites, because all of them
>follow the same concept: (Simply) *find the move* (but not, find the move and
>give me perfect explanation/evaluation/analysis of why it is best).

Please dont exaggerate. The "WM-Test" is criticised and not all tests. How can
the first position of the "WM-Test" be a reasonable test position if it has two
reasonable solutions? Please prove that all tests have such shaky test
positions.



>
>So, if that critic would be valid, it would fit not only to the WM Test but
>obviously to all test suites.


You are exaggerating and that is the only thing that is obvious here. But it
must be real fun for you to discredit a critic against your beloved "WM-Test"
and if you can't challenge the critic you must invent delusional questions to
confuse the readers and users of your test.


>So, all authors and users of test suites didn't
>spot that fatal mistake yet, since so many years...? :-)) I don't claim that one
>single man cannot have the unique and correct ideas while all others are wrong,
>sometimes, but such "Galileo" cases (and also "Leonardo" cases :-)) are very
>very seldom.

If you were decent in your reply you would admit that the critic of Hagra makes
him the Galileo of the WM-test critic. Yes, that's funny. :)




>
>>(...) The also
>>here known author Michael Scheidl assisted in that fight.
>
>Thanks for adressing me as a known author. So it seems that I have at least
>achieved a bit (it was a lot of hard work! :-))


Wrong interpretation. My concern was the HERE known, not the fact of you as
being an author. HERE known was the mention of you as a known author here in
CCC. What is the reason for such a reaction? Are you so happy to forget about
Hagra for a moment? ;)



>
>>IMO the whole
>>argumentation is unfair because if already a general critic is sound and comes
>>to a negative judgement then the practical argument has no more sense at all.
>
>How do you see *if* such critic is sound for a specific position? There's no
>other choice than to analyse chess-wise (!) and illustrate the critic with
>variantion etc. The latest provided PGN is a good example of how such critic
>should be presented. - The "general" critic is not generally valid, because i.e.
>test authors do of course usually choose typical test-like (or test-fitting)
>moves to ensure, or to make it most likely at least, that they won't be chosen
>for wrong reasons without proper understanding. Typical are *sacrifices* no
>engine would choose "just for fun" so to speak. You don't waste a rook without
>seeing the gain. Of course you can't, or in general shouldn't, use positions for
>a test where the solution move is a normal boring 08/15 move which doesn't allow
>any conclusion what it is played for, in itself. Below I give two examples from
>my Quicktest to illustrate this. Tell me if you find engines which play these
>solutions for the wrong reasons. :-)
>
>(For a *very big* test, 1000+ positions, the above could eventually be ignored
>because when an engine A i.e. plays strong moves in 867 out of 1000 and engine B
>in only 427, it will always tell a lot about the A/B analysis power relations
>disregarding the "correct reason" question.)
>
>Did you take a look a the currently available WM Test results of 230 (!!)
>engines yet? You'll find the
>
>known strong engines at the top,
>medium engines (good amateurs, older profis) in the medium ranks
>and weaker engine at the end of the ranking list.
>
>(With only very few exceptions or "surprising" rankings.)
>
>How would you explain these results when that whole test (and -method) wouldn't
>be valid?? Is it wizardry? :-)


This is all very interesting and good stuff to think about. No doubt about it.
What I dont understand is the fact that you are forgetting your own argument
against such weak positions with no unique solution. The trick here is that you
argue with 1000 positions, meaning that then a single wrong position had no
significant influence on the final test results. Yes and no. Michael, the
terrible problem of the first test position is, as Hagra could prove, that such
a test as such is invalid in regard of the claimed conclusion "chess analysis
ability". Because it is proven now that a stronger machine would be seemingly
weaker, following the definitions of the test by Gurevich. Why cant you
understand that forced contradiction and idiocy? And you are still happy with
that test? Because the CSS journal has accepted it as the best?

ALL. almost all, what you wrote above is extremely interesting for me to read
but you failed to address the Hagra critic. Now we can speculate if you did it
intentiously or because you still didn't get the meaning of the Hagra critic.


>
>>(...) it is also
>>possible to play Rad8 and way later Re3 instead of the test solution Re3.
>
>But Rad8 has no forcing character IMO and threats nothing special. Re3 issues
>the strong threat Rxg3. (Just an observation. - I hope Mikhail will add some
>comments about this.)

I did already think that you were the fidel guide for Mikhail. :)

But seriously, how can you say such a nonsense. Where is the test logic in your
idea of a forced continuation? Did you ever hear of the chess wisdom that the
threat of a threat is the strongest threat and not the already/ directly played
threat??? Why is the WM-Test searching for a forced line? If there is a good
second line? Who has the better analytical abilities? The stronger machine with
the deeper calculation or the weaker machine on weaker hardware which only sees
a seemingly  forced single solution???? You know what I mean, Michael?



>
>>All
>>who know details about tests know that the fact of a second solution decreases
>>the value of a test position.
>
>True - but only when there really is a second solution of *almost the same
>strength*; I think alternatives which are clearly weaker are not a problem
>because it is the challenge to find the *best* move and not just a good move,
>*in analysis* (it could be discussed if that is different in practical games).


I know what you mean, Michael, but you miss the meaning of the deeper
calculations of the stronger machine in computerchess. What is the same strength
for you?? The same value on the computer display? Michael, Michael! Get real!



>Of course, perfectly clear positions whithout alternatives, i.e. only move X
>draws and all others lose, are preferable.
>
>>But the readers shouldn't forget that the main problem for such engine tests is
>>the finding of positions which allow to test what the test founder pretended.
>>Here the WM Test allegedly can test the ability to analyse. (...)
>
>It can, because
>
>1. we know the good difficult continuations of the test postions, and
>2. during the test run, engines have to analyse these.
>
>So the engines analyse and we can compare and judge about the analysis results
>(at move #1). Basically that's not different from the way 99.9% of all test
>suites are done.
>
>>In other words, the academic doctor MG claims a deeper meaning
>>with his test but in reality he has put together these 100 positions without
>>showing the validity of the positions for his own insinuations into the test!
>
>IMO the question of validity has to be answerd chess-wise in the first place,
>and that MG has done by giving solution variants, subvariants and comments in
>the data provided with the WM Test package. Nevertheless, many positions require
>some studying of the user to be convinced of and/or to understand the solution.
>Some I found very difficult and had my doubts too. This may be strength
>dependant (it may be more clear for stronger players - and for engines - often).
>Nowadays tests just have to be very difficult to reveal any significant
>differences between strong engines.
>
>Anyway, it will always be best to discuss test positions and -suites based on
>direct chess-wise analysis etc. which may supported (or started) by engine
>output only. This will lead to constructive dialogue and fun with chess itself.

You are a real spin doctor. Here I cant contradict you. Fun, fun, fun, yes. That
is an extremely important factor of such a test. And here in case of a Wch test
it is real fun because the aspect of Wch is so interesting and fun. I mean
wouldn't we all like to become Wchamps??? :)

That is the trick and the wrong of the whole "WM-Test". Of course the positions
are interesting chess. But already the first position is a weak test position
because it doesn't provide us with a unique solution that is directly
proportional to the strength of the machine!!! Don't you get what Hagra has
found out? Please michael, if you want to enter a fair debate, then stop
censoring with your CSS team and get real in the content of decent critics.
There you still have to learn a lot. As a spin doctor you have incredibly funny
ideas, but without a good understanding of test theory you can't outplay est
critics like Hagra and yours truly.  MfG, Rolf T (whose decent messages are
still censored by the CSS team)



>One point of the discussion was that strange observations of engine output
>*only* are not sufficient to base valid critizism on, and if you look at the
>last critic issued, it seems that there is consensus about this :-) maybe with
>the exception of you (?).
>
>mfg.
>Michael Scheidl
>
>[D]1n1r1rk1/ppq2ppp/3p2b1/3B1NP1/4PB1R/bP2P2P/P1P5/3KQ1R1 w - - 0 1
>1.Qc3! (Quick-01)
>
>[D]3Q4/3p4/P2p4/N2b4/8/4P3/5p1p/5Kbk w - - 0 1
>1.Qa8! (Quick-03)



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.