Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Ultimate Use of Suites of Test Positions??? - The Power of Chance!

Author: Rolf Tueschen
Date: 15:29:27 12/07/02
I was away the whole day, but I found some good ideas for the answer, Bob.

On December 06, 2002 at 20:11:31, Bob Durrett wrote:

>On December 06, 2002 at 19:48:09, Rolf Tueschen wrote:
>
><snip>
>
>>Also a debate between you and me and others here is the best what could happen
>>because that is interdisciplinary cooperation. You could bring the very best of
>>your talents into the debate because others might go visiting on too many
>>tangents... then you organize the recovery!
>>
>>Rolf Tueschen
>
>My debating skills are worse than those of a newborn baby!  I know my
>limitations.  That is my one great strength [I think.]  Besides, there are other
>productive formats for discourse besides debate.

Brainstorming.


>_ _ _ _ _ _ _ _ _ _ _ _
>
>But I would like to get back to your ideas regarding chess software.
>
>In particular, your feeling that it would not be possible to measure the
>strength of a chess engine [or a human either, for that matter] by using a set
>of test positions.

Objection. For a human it is always a good test of how many "expressions" of
chess technique he's mastering.

What I said wasn't that tests were nonsense. But positional positions were
impossible for tests for computers.

>
>When students graduates from college with a Bachelor's Degree, here in the USA,
>they are encouraged to take a comprehensive exam which is intended to indicate
>whether or not the student learned anything. [Versus wasting several years.]
>
>I had to take such a test.  As an electrical engineer, I was required to take
>the GRE Advanced Test in Engineering.  I did very well on that test and was
>admitted to Graduate School primarily for that reason.
>
>I would like to suggest that, if I had to take such a test, it is only fair that
>every chess engine should have to take an equivalent test too!

Now that is one of the sentences I couldn't understand yesterday. For what
purpose you want to order that? And also to your big test there, I would say
that it would be also a good test if a professor would talk with a student.
Because then he could well see if the student had understood something. I don't
trust test suites too much. Multiple choice technology is extremely fallacious.


>
>The test would be very comprehensive.  It would include five or ten suites of
>test positions.  Perhaps 500 positions in all, minimum.  A new set of positions
>would be used each year.
>
>In the proposed scenario, the testing organization should have the
>responsibility and resources necessary to design and adjust the tests to match
>the SSDF results.

Again I can't understand you. What has the incredible invalide test results of
SSDF to do with our question of position testing?



>
>In other words, I propose a comprehensive test which has, itself, been tested
>and verified against the SSDF [and similar] test data.

Excuse me but this is a logical impossibility. You can't expect to create
something out of "nothing". That would be called magic.  :)


>
>If you stick to your guns on this, you will assert that the proposed idea would
>fail miserably.  Right?  But why would it fail?  Could you be specific, please?
>
>: )  : )  : )  : )  : )  : )  : )  : )  : )  : )  : )  : )  : )  : )
>
>Bob D.


Ok I stick to my guns. Shooting mode ON:

Now the creative part of my Saturday reflections on the road in the Cold. In
Germany "cold" is already 1 degree Celsius plus. <g>

I want to prove now one and forever why these matches in SSDF don't say
something interesting about strength of chess programs. Programmers of the World
please listen! (sorry, it's shooting mode)

What is a chess game?
======================

a) GM chess

A chess game between masters is a inter-related combination of the correct
application of GM technique and chains of chess positions. A GM can win a game
either by 1) better sophisticated application or 2) error of the opponent or 3)
just by luck of position that both could not foresee (all along the Law that the
concrete positions dominate in chess, not the 'technique' or wishful thinking or
believing in magic). Point 3) is interesting, because when nearer to the special
concreteness both masters know the solution but only one side can make profit
of.

b) amateur 1500 chess

That kind of chess is also a chain of concrete positions but after a small
opening period (learned by heart) almost all positions are being treated by
chance without correct technique. The amateur has a lot of ideas but can't judge
what idea is appropriate in the concrete position.  The amateur believes that
his idea is the power itself that could cause winning. In reality a wrong move
could well win because of a false reaction by the opponent. But since the next
move could also be weaker one false reaction normally doesn't lose. We could sum
up that such a game is a series of moves by chance and in the end normally luck
of position is winning if the many mistakes didn't lead to a clear material
advantage before.

c) chess programs chess

After the extremely high level opening period tactical positions (appearing by
chance or clever choice of the opening books and their depth)are played with
extremely exact technique from both sides with the follow-up of either material
advantage and later win or by chance of position one of the three possible
results. Now my interest was focussing on those games with positions after the
opening period whithout tactics. Of course that must happen in games against
humans bbecause machines can't play positional chess and they will always search
for tactics. Also if they are not there! ;)
Now in these positional positions it's again a matter of chance. With all
possible combinations. A basically better position (from GM perspective) could
lead to loss, draw or win, the same all other positions could lead to these
three results. Because all is happening in chance mode. 'Digging in fog' mode.

Now let's quickly make logical conclusions. If that is happening in games this
would also happen in positions in tests. (I'm still talking about c)!)

Now the irony is that the test creators calculate there results for their
positional positions following the evaluations on the display of the programs
and not following the understanding of the machines. Reasonably because we don't
know a thing about this "understanding". Programmer could say (see Bob Hyatt)
that his program would know the story of the two advanced pawns and their
strength, but you could never show that with test positions. Because the final
position is - Rolf said it before - too late to be sure that the prog has
understood the topic. But if we go back a few moves in the development of the
position the chance factor could destroy our beautiful idea.

Only to mention the new discovery of the defenders of such tests. They insinuate
that if something should go wrong in a position that doesn't matter because in
the "overall" 100 positions this would be "ausgemittelt". A creative term out of
magic. It's basically the argument that i or 10 nonsense positions would be not
so important if we base our final judgement on 100 positions. I would say true,
if not the 10 nonsense positions are all in the positional test part. Rolf
dixit! <g>

Back to SSDF games. All these games are in the end (the small sector of real
chess in-between opening books and endgame tables!) a chain of chance moves in
blindness mode. And since the blindness of each war generation (still shooting
mode ON) is universal (I liked the term in the reasoning of Michael Gurevich in
his come back speech) the best programs of the same generation are always in the
same neighborhood. But that SSDF result was known before because it's a
universal LAW!

So as always I mention that the whole results in SSDF are invalide.

You could only test chess programs in games against human GM. Because then you
leave the sphere of chance and magic with the general restriction of the basic
chance factor through dominationg concreteness in chess (where even GM are
helpless, but against progs this influence could still be properly controlled,
with the exception of exhibition matches again, where the GM must also take care
of the commercial interests of the company, see Bahrain...).

Please tell me if you could need some of these thoughts, Bob.

Rolf Tueschen
Re: Ultimate Use of Suites of Test Positions??? - The Power of Chance! Bob Durrett 16:42:56 12/07/02
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.