Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: WM Test Position 1 - ENGLISH Testvalidity of "WM-Test" Part II

Author: Mike S.
Date: 16:22:34 06/10/04
On June 10, 2004 at 16:41:29, Rolf Tueschen wrote:

>On June 10, 2004 at 08:49:21, Mike S. wrote:
>
>>On June 09, 2004 at 19:05:40, Rolf Tueschen wrote:
>>
>>>(...) He basically doubts
>>>that a chess position from real life chess can test a machine because it is
>>>difficult to decide why the machine has adopted a specific continuation.
>>
>>Actually this would be a critizism of *all* test suites, because all of them
>>follow the same concept: (Simply) *find the move* (but not, find the move and
>>give me perfect explanation/evaluation/analysis of why it is best).
>
>Please dont exaggerate. The "WM-Test" is criticised and not all tests. How can
>the first position of the "WM-Test" be a reasonable test position if it has two
>reasonable solutions? Please prove that all tests have such shaky test
>positions.

Why are your talking about the new first position critic here? I was referring
to your *general remark* against test suites in the paragraph above. Please
re-read your own sentence :-)

>>So, if that critic would be valid, it would fit not only to the WM Test but
>>obviously to all test suites.
>
>
>You are exaggerating and that is the only thing that is obvious here. But it
>must be real fun for you to discredit a critic against your beloved "WM-Test"
>and if you can't challenge the critic you must invent delusional questions to
>confuse the readers and users of your test.

1. I can challenge almost all critics when I want to, no problem :-)
2. I don't aim to discredit critics of the WM Test.
3. I did claim that valid critzism of a test suite requires chess material
(analysis variants, PGN, comments) of specific position(s) to be based on, not
only general remarks or engine output observations which are not sufficient.
4. Hagra seems to agree to point (3) because in his latest critic (of WM-Testpos
#1), he provided just that.

>If you were decent in your reply you would admit that the critic of Hagra makes
>him the Galileo of the WM-test critic. Yes, that's funny. :)

He is not the first one who has doubts in WM Test positions. Actually I had
these too, for some positions. Mikhail gave me explanations of some solutions in
the CSS-Forum. Some are very difficult with hidden tricks (hidden from my
strength's viewpoint); not every detail is given in the solution tree of
variants each. Also, general doubt in test suites had been issued often, before.
There's a whole army of such Galileos it seems. But somehow they seem to be not
convincing, as the "happy testing" continues and even increases with more and
more suites which are developed etc. :-)

>>Thanks for adressing me as a known author. So it seems that I have at least
>>achieved a bit (it was a lot of hard work! :-))
>
>Wrong interpretation. My concern was the HERE known

I indeed missed the word "here." I don't consider it being important being known
here... (which I'm not sure of), this place isn't really public. Actually I had
removed the CCC from my browser favourites and just came back due to an info and
link to this current discussion, where my name appeares in.

>>(...)
>>How would you explain these results when that whole test (and -method) wouldn't
>>be valid?? Is it wizardry? :-)
>
>
>This is all very interesting and good stuff to think about. No doubt about it.
>What I dont understand is the fact that you are forgetting your own argument
>against such weak positions with no unique solution. The trick here is that you
>argue with 1000 positions, meaning that then a single wrong position had no
>significant influence on the final test results. Yes and no. Michael, the
>terrible problem of the first test position is, as Hagra could prove, that such
>a test as such is invalid in regard of the claimed conclusion "chess analysis
>ability". Because it is proven now that a stronger machine would be seemingly
>weaker, following the definitions of the test by Gurevich. Why cant you
>understand that forced contradiction and idiocy? And you are still happy with
>that test? Because the CSS journal has accepted it as the best?

The 1000 postions idea was not referring to positions having second solutions,
but to such positions which may be solved (earlier) for the wrong reason,
seemingly. I meant that in such a big test, some "wrong reason" solvings
wouldn't be a problem.

(But still, if there would be really one (only) position with a second solution,
it would be 1/10 of the size of the problem compared to a test with 100
positions.)

>ALL. almost all, what you wrote above is extremely interesting for me to read
>but you failed to address the Hagra critic. Now we can speculate if you did it
>intentiously or because you still didn't get the meaning of the Hagra critic.

http://f23.parsimony.net/forum50826/messages/100797.htm

>But seriously, how can you say such a nonsense. Where is the test logic in your
>idea of a forced continuation?

I don't know, maybe because when a move can force something to my favour, it is
good? :-)

>Did you ever hear of the chess wisdom that the
>threat of a threat is the strongest threat and not the already/ directly played
>threat??? Why is the WM-Test searching for a forced line? If there is a good
>second line? Who has the better analytical abilities? The stronger machine with
>the deeper calculation or the weaker machine on weaker hardware which only sees
>a seemingly  forced single solution???? You know what I mean, Michael?

No... the engine must always search for the best move from it's viewpoint.
Basically, the whole testing logic is very easy. It are samples.

>>True - but only when there really is a second solution of *almost the same
>>strength*; I think alternatives which are clearly weaker are not a problem
>>because it is the challenge to find the *best* move and not just a good move,
>>*in analysis* (it could be discussed if that is different in practical games).
>
>
>I know what you mean, Michael, but you miss the meaning of the deeper
>calculations of the stronger machine in computerchess. What is the same strength
>for you?? The same value on the computer display? Michael, Michael! Get real!

For example, when there are two alternative continuations a strong player would
both comment with +/- at the end of the reasonable variants (disregarding if and
how engines evaluate them both equal too or not).  Or (more clear) if there are
two variants which lead to a perpetual or something like this, which can be
clearly evaluated. These are 2nd solutions. - When it's about mate though, the
shorter mate is always better. Some people say, mate is mate no matter how many
moves. I don't share that ludicrous opinion :-)) I have chess thinking, not
result thinking where 1-0 and 1-0 look identical no matter if one was #7 and the
other was #23.

[D]3k4/p7/K3BP2/8/7p/8/2P4P/8 w - - 0 39
Only 39.Kb7!! is best!

>>(...)
>That is the trick and the wrong of the whole "WM-Test". Of course the positions
>are interesting chess. But already the first position is a weak test position
>because it doesn't provide us with a unique solution that is directly
>proportional to the strength of the machine!!! Don't you get what Hagra has
>found out?

Exuse me, when a move A issues the tricks the test position is all about, and
there is another move B which delays that move for no appearant reason and gains
nothing in addition, I cannot consider B a valid second solution. Delaying the
correct solution (again and again) would in the end lead to *not reaching the
goal*. It makes no sense to claim the delay of a strongest continuation would be
as good as the strongest continuation itself. It's much worse, because for
example it may raise new chances for the defender. If it doesn't raise such
chances, it's still worse simply because it needs more moves unnecessarily. I
don't see any gain in Rad8 yet (admitting that I didn't do intensive own
analysis of all that now).

It's like you'd find a pot of gold, but instead of taking it home you think
"Hey, why take it now, so quickly? Let's go to the pub first, I'll come back
later..."

>(...) As a spin doctor you have incredibly funny
>ideas, but without a good understanding of test theory you can't outplay est
>critics like Hagra and yours truly.

I consider myself being something like a computerchess test suite expert :-))
sorry for the self-praise. I think that is sufficient and I don't require skills
in scientific test theory for that purpose. I think some basic concepts are
known to me (like identical test conditions, proper documentation, etc.).

MfG, Rolf T (whose decent messages are
>still censored by the CSS team)

What you call censorship are very labour-intensive ways to allow guest posters
to participate as often as possible without password requirement, and at the
same time try to defend the message board against people who only want to make
trouble. There is no civil right that an internet message board must provide
access to anybody anytime for everything. Recently, we sometimes use a new
moderation function which stores proposed new postings in a buffer (as known
from other fora of that type which used that for longer), and we decide which
appear online. It's just like you'd write a reader's letter to a newspaper.
There, somebody will decide if it will be printed or not, too (and certainly the
percentage of what goes to print is much smaller, compared to us :-)). Would you
call it censorship when you write a reader's letter to your newspaper, but they
don't always print it? I guess *you* would... but the rest of the world except a
few fools do not.

AFAIK there was an agreement in the (by now distant) past that you won't write
in the CSS Forum again, and that agreement is still in power so to speak. The
reasons are known to all the persons concerned.

mfg.
M.Scheidl



>
>
>
>>One point of the discussion was that strange observations of engine output
>>*only* are not sufficient to base valid critizism on, and if you look at the
>>last critic issued, it seems that there is consensus about this :-) maybe with
>>the exception of you (?).
>>
>>mfg.
>>Michael Scheidl
>>
>>[D]1n1r1rk1/ppq2ppp/3p2b1/3B1NP1/4PB1R/bP2P2P/P1P5/3KQ1R1 w - - 0 1
>>1.Qc3! (Quick-01)
>>
>>[D]3Q4/3p4/P2p4/N2b4/8/4P3/5p1p/5Kbk w - - 0 1
>>1.Qa8! (Quick-03)
Re: WM Test Position 1 - ENGLISH Testvalidity of "WM-Test" Part II Rolf Tueschen 17:56:32 06/10/04
- Re: goodbye, thanks & farewell (was: WM Test ...) Mike S. 07:33:03 06/11/04
  - Re: goodbye, thanks & farewell * We will see us again in old Friendship! Rolf Tueschen 12:29:10 06/11/04
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.