Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: an example how users - not programmers - use tests

Author: Rolf Tueschen

Date: 09:34:43 06/20/04

Go up one level in this thread


On June 20, 2004 at 12:10:57, David Dahlem wrote:

>I've seen numerous examples of one engine solving a test suite position in a few
>seconds, while another engine of known equal game playing strength never finds
>the solution, even after hours of analysis. To me, this makes test suites
>worthless, or at least very difficult to interpret the results.
>
>Regards
>Dave


Yes, correct, this is what is called the lack of reliability of the results, as
Sandro explained. It's a typical wrong with these position tests, but all test
knowies know it, however the question is how to explain that triviality to lays
and motivated users and to a founder with a blind spot? In special who is losing
himself in the circle argument that every critic at first should run the test
suite because they would THEN realize how good it is. You know from the chess
quality of these positions on...! I can only repeat this: a famous CC journal
and a whole team of forum mods who don't want to "hurt" a test founder and so
tolerate that he loses himself in such a circle - is the main responsible for
that mess. Because that someone, even a scientist, _can_ go wrong and can't
realize this, that is not such a seldom event. It doesn't mean that he's bad or
not intelligent or such. Sometimes you have this "wall" in your head. And you
can't find a brick. Later you break out into laughter and you wonder why you
couldn't see it. Here in our case the main founder is a Russian academic doctor
who certainly has learned the basics of scientific reasoning. Therefore he will
understand in the end the difference between testing the end-product or a
prototype. He does also know these two obstacles, namely validity and
reliability. And he should know that statistical calculation could never
"create" significance if it's not in the data.

I do also think that we must change a couple of terms. When the users are
playing with their engines and run them through 100 positions, this can't be
called "testing"! It looks like but it's not testing.



This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.