Computer Chess Club Archives


Search

Terms

Messages

Subject: The Reasons for Misunderstandings in the Debate about "WM-Test"

Author: Rolf Tueschen

Date: 12:57:17 06/14/04


[This is an analysis without any insults or bad thoughts, it's just analysis!]


What wants the "WM-Test", what say its authors and what do they want to
========================================================================
achieve?
==========

I have thought about it for a long time because I am a bit confused by the way
certain people behave - who I think are smart enough to be able to differentiate
between green and red or yes and no. I am confused because some people are
trying to indoctrinate me that I should think yellow when in reality I see only
black. Of course this is big stuff for a scientist so to speak. It has also
certain ridiculous aspects because normally in science you must not always prove
that 2 plus 2 makes 4. This is a fact and nobody who would oppose to such a
trivial thing could expect that scientists would accept him as something similar
to an expert. They would more see someone like this as a disabled or uneducated
and poor individual. So, to make this clear right at the beginning, there is no
reason for me to insult or become angry. All this will be more about
astonishment and laughing out loud about a real satire.


                              +++++++++++++


I can show you with a simple quoting what the authors of the WM-Test had in mind
with their 100 Worldchampion positions.

Time ago Stefan Mayer-Kahlen and Chrilly Donninger were guests in the so called
hearing in the CSS-forum. Of course the "value" of position tests were
discussed. And also the test authors were interested:

a) Manfred Meiler: Frage gewissermaßen "in eigener Sache": Interessieren Dich
Shredders Ergebnisse in Stellungstests (z.B. dem neuen CSS-WM-Test oder dem
Zusatztest WM-Test Extra)?
SMK: Puh, ich versuche, auf diese Frage ehrlich zu antworten, auch wenn ich
dabei manchem hier sicher vor den Kopf stoßen werden. Also, das Ergebnis in der
Form, dass Shredder xyz Elo im Test abc hat, interessiert mich nicht, da dies
meiner Meinung nach dass man aus diesen Zahl nichts ablesen kann.
Manfred Meiler: Wenn ja, wie interpretierst du Shredders Ergebnisse dabei?
Fließen Erkenntnisse daraus auch in die Programmentwicklung ein?
SMK: Nein, nicht aus den reinen Zahlen.
Manfred Meiler: Oder sind vielleicht ganz andere Stellungstypen/Aufgaben für
Programmierer bei der Weiterentwicklung ihres Programms von Interesse ?
SMK: Ich haben Interesse an Aufgaben, die allgemeine Schwächen in Shredder
aufdecken. Diese Aufgaben versuche ich dann zu verbessern, da ich mir davon das
Abstellen oder Mindern einer Schwäche verspreche. Leider sind diese Aufgaben
schwer zu finden. Ein einfaches Beispiel dafür ist Stellung 3 aus dem alten
BT-Test.

In English:

Meiler: Question, so to say in my own interest, are you interested in the
results for SHREDDER from "WM-Test" and "WM-Test(extra)"?
SMK: Outch! I'm trying to be honest, even if I will hit some people here against
their heads. Well, the result in the form, that SHREDDER has xyz Elo in test
abc, that doesn't interest me, because IMO one cannot conclude anything from
such results.
Meiler: If yes [sic! Rolf], how do you interprete the results of SHREDDER? Are
the results influencing your programming?
SMk: No, not from naked numbers.
Meiler: Or are perhaps totally different positions important for the development
of programs?
SMK: I'm interested in positions which show me the weaknesses of SHREDDER. I
will then improve these [? Rolf] positions. Because I hope to reduce the
weakness this way. It's a pity that such positions are difficult [sic! Rolf] to
find. An easy posiion is position 3 of the old BT-Test.


Michael Gurevich: Hallo Herr Dr. Donninger, zu seiner Zeit haben Sie sich
ausführlich mit dem in CSS veröffentlichten BS-2830- Stellungstest und seinen
Aufgaben beschäftigt. Ihre Vorschläge wurden im neuen Weltmeister-Test
berücksichtigt, der im Moment aus 90 vielseitigen und hochwertigen Stellungen
besteht. Meine Fragen sind:
1. Das Wissen macht Brutus als eine Hardware-Lösung unwesentlich langsamer.
Bedeutet es, dass Brutus ein sehr gutes Analyseinstrument sein soll? Das
Problem: Tausende Schachspieler von Amateuren bis Anand und Kasparov benutzen
PCs fast ausschließlich für Training.
2. Welchen Platz erwarten Sie vom Brutus in unserer unabhängigen
Engines-Rangliste zum WM-Test (Athlon 1.4, Tester Manfred Meiler), der die
Ergebnisse von über 75 Programmen enthält?
Chrilly: Ich glaube, das wurde schon während der Sprechstunde am Sonntag
beantwortet. Vielleicht noch etwas Prinzipielles zu den Stellungstests: Ich
schließe mich auch da 100% der Meinung von Stefan Meyer-Kahlen im letzten CSS
an.

In English:

Gurevich: Hello, Dr. [sic! Rolf] Donninger, at the time you were busy in detail
with the BS-2830-position-test and its positions which were published in CSS
[sic! Rolf]. Your proposals were considered in the new Worldchampion-test, which
has actually 90 highly-valued [sic! Rolf] positions. My questions:
1. Its knowledge makes BRUTUS - as an hardware-solution - unessentially slower.
[...]
2. Which ranking place [sic! Rolf] do you expect for BRUTUS in our independent
[sic! Rolf] Engine-ranking-list [sic! Rolf] (ATHLON 1.4, tester manfred Meiler),
which contains the results for over 75 programs?
Chrilly: I believe, this was already [sic! Rolf] answered during the hearing
last Sunday. Perhaps still something in principal about the position tests: I
agree 100% with the opinion of Stefan Meyer-Kahlen in the last CSS. [sic! Rolf]

                            +++++++++++++++++++++



You know what I mean? Here in CC we have three outstanding experts and
computerchess programmers (R. Hyatt, Ed Schröder and Uri Blass) who unanimously
agree that position tests with their ranking lists as such do NOT work for
programmers themselves. In the CSS itself we learn that further two programmers,
the multitime Worldchamp SMK and the academical doctor and also longtime
programmer expert Donninger both agree in the statement that position tests and
ranking lists and Elo numbers have no influence on their programming and
improving their babies. However SMK ultimately states that good positions are
_difficult_ to find. He says it althout knowing that Gurevich implies that he
has "found" spectacularly good positions from human chess games, played by World
Champions and good to find out how good the analytical abilities of the programs
are!
More - - we have the statement of Schröder who asked Scheidl to clarify if he
could find a single programmer among the top leading 10 best who believed in
position tests - he said you wont find a single one!

This is the one side. 5 Experts, science, says position tests are nonsense. They
simply dont _work_. The ranking lists they contain _are_ flawed. The concrete
use of position tests for the improving of engines is zero! because these
positions are most of the time not good enough to give good informations to the
programmers - ALTHOUGH from the other side it's the biggest argument that for
example the WM-Test IS by definition made for the judgement about the analytical
abilities of engines! Apparently this is a claim that is absolute fantasy. At
least of these real programmers could be believed. And since we have no other
experts we must believe them!!

Conclusion: the real experts and also the programmers themselves say that
position tests have no meaning for them.


                          ++++++++++++++++++++++++


What is now the reaction of the "other" side? The side of the authors and their
main support CSS journal?

Here are a couple of quotes translated from the German:

(don't be surprised if now we must read some veritable insults!)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Gurevich:  as critics I can only accept people who are working with the WM-Test

Gurevich: the opinions of non-users don't interest me

Scheidl: It is shocking how unbelievably incompetent [sic! Rolf] the view of an
experienced, successful computerchess programmer, a true heroe of the history of
computerchess [meaning Hyatt of course! Rolf] can be about position tests which
he's allegedly [sic! Rolf] never using. Perhaps this explains the obvious
impossibility to surpass a certain level of performance... [sic! Rolf]

Gurevich: A practical chess game is also nothing else but a follow-up of
positions, very similar [sic! sic! Rolf] to a position test. Main differences
are only [sic! Rolf] that the following game positions belong together - but
most of the time they dont show an expressive test character. [multi sic! Rolf]

Scheidl: For these persons this is all about policy on its lowest level, badly
disguised by applied computerchess terminology on the surface, which then turns
at times into the character of verbal hate crimes. [sic! Rolf]

Scheidl: Self-understand that expert [sic! Rolf] who takes the burdon to react
emotion-less, must forcedly take the objection he did it just because he were a
member of the Evil Empire, and because he had the order to suppress all kind of
criticism.

Scheidl: This is probably not a good example for emotion-less criticsm, but a
disappointed commentary about a potential [sic! Rolf] idol [meaning Hyatt again!
Rolf], who practically does judge down as worthless a very good [sic! Rolf] test
or even all position tests.

Scheidl: position tests have been always a totally normal part of this
department [sic! German "Fachbereich", as if this were about universities and
not a hobby! Rolf] and of the application culture [sic! Rolf] of chess computers
and programs. Should one throw them all away because some alleged [sic! Rolf]
superguru [guru coming from the gurus in India who make a lot of impressing
experiments and who are worshipped like religious idols; Rolf] plus a couple of
disturbing persons of the scenery do NOT know to estimate the WM-test? The
WM-test serves as an occupation for the fans and as an interesting pass time, it
serves WELL as a comparison of performances to the programmers [sic! Rolf] or
also for statements point for point, and it's a stimulation to get busy with
highly-cassed and difficult analyses, etc. etc.

Scheidl: Just like in analyses like in a game engines always meet positions and
are trying to find the best moves. This is by far point number one. Time
distribution, hash holding etc are side aspects but with undoubtable influence
so that an exact mirror of a game playing strength. But here is the close
connection. To doubt that is a good possibility to out oneself as ignorant. This
is a favored choice for several people.


                          ++++++++++++++++++++++++++


The last quote above is the best proof for the complete misunderstanding of the
defenders of the position tests.

a) they admit that it's all about fun and the joy for testers to learn from Wch
positions in chess without any importance for chess programmers but

b) on the other side in a totally inconsistent claim it is clear what the
authors and supporters do really hope for; they make the mistake to take highly
interesting chess position from chess heroes as autonmatically good puzzles for
chess playing programs. Although these positions seperated from their games
don't have a usefullness for the top programmers. Becaise, as Bob explained, not
because these positions didn't allow results but because the results these
positions give are already known BEFORE! And for engines where the results are
not yet known, the positions don't work. They simply don't deliver what the
authors pretend!!!!!!!!


Is that truth too difficult to understand? Apparently yes. This might be too
difficult for lays. But instead that they believe a professor and expert like
the multi time Wchamp Bob Hyatt or the other big number Ed Schröder, they are
inwsinuating hatred and ignorance on the side of the real experts in
computerchess. What a nonsense! As if these heroes had the need to hate CSS or
to have an opinion about it at all...


I hope I could show how biased the spindoctors of CSS, mainly Scheidl, are
behaving. But I do also hope that I could explain why the psosition teasts
unfortunately have not the value for the programmers that is always insinuated.
The experts themselves, all in a row, denied any value at all. End of the
debate. I know the reason for the dissent. All those who work with such tests
are enjoying the beauty of chess and they also enjoy to see what favorite
programs can do in such difficult positions. This can become a hobby in itself.
But here we must judge the meaning of such tests for the programmers themselves.
They simply can't use the results of such tests. However they were very pleased
if they COULD!!! But unfortunately they can't.

                               ---------------


Excuse my poor English. It's very difficult to argue in a second language. I do
also apologize for possible little mistakes in my translations. I am sure that I
did never completely change the direction of the meaning.






This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.