Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: mclane's summer-tournament: round 6 update

Author: Don Dailey

Date: 17:06:32 09/06/98

Go up one level in this thread


On September 06, 1998 at 18:54:20, Thorsten Czub wrote:

>On September 06, 1998 at 16:55:48, Don Dailey wrote:
>
>>I think your method is pretty flawed if it depends very much on
>>subjective analysis.  For instance, if I did not like a program,
>>I would be looking for problems and I would find them.  I would
>>dismiss the good moves and make excuses.  I wouldn't do this
>>on purpose, I would do it simply because I was human.
>
>We can bet that I am faster than your statistical approach and also more
>precise.
>my method has only one negative problem: i can only be exact with programs i
>have a positive feeling/relationship.
>
>My method works like relationsships with human beeings.
>You can only make a positive progress if your relation with the other human
>beeings is ok. If the relation is shit, you cannot produce any good results.
>
>
>>Your method should always be a part of the whole testing philosophy
>>however.  You can get into the trap of playing thousands of
>>games and not seeing a single one.  I think you must continue
>>to do what you do, but other more objective test MUST also be
>>performed.
>
>I do statistical stuff too. But it is not done to decide about strength, it is
>only done to proof my prejudices or in more poositive saying, to proof my
>thesis.
>
>I do not begin with RESULTS. I do only use it for proving the judgement.
>
>Very often the empirical data that is produced after a few months playing gives
>the same judgement than my first impression with the program. This is the same
>with human beeings. Of course there are a few human beeings that impress you,
>and after a while you find out they are liars and assholes and they spent much
>time of their lives to act as if they are nice friends.
>Chess or humans. It is exactly the same for me.
>Believing in friends, believing in programs. Exactly the same.
>
>
>> And taking a liking to the way a program plays
>>is just a whole lot different from knowing if it can get results.
>>
>>Self testing is an important part of the way we test.  It is
>>ideal for some things, not very good for others.  We test all
>>sorts of different ways and watch our program play to learn
>>what is wrong.   I don't think anyone has found the idea method
>>for accurately measuring tiny program improvements.
>
>right. It is all methods together. But still, despite all the different methods,
>i believe their is a non empirical, emotional way of doing it, that has also
>exact results. But it is difficult to find out WHY it works.
>
>> If you
>>give me a way to measure a 1 rating point improvement, I will
>>write you a chess program that beats all the others with no
>>question about who is strongest.
>>
>>Larry had a theory that self testing exaggerates improvements.
>>If you make a small improvement, it will show up more with
>>self testing.  If this is true (we called it a theory, not a
>>fact) then this is a wonderful way to test because small
>>improvements are so hard to measure and this exaggerates them.
>
>it can also IMO lead you into a completely wrong direction.
>when the new feature is only working against x and not against all other
>programs y. Or against humans. You think you have an improvement, and in fact it
>is only that it is improvement related to x.
>
>>We know that self testing won't expose our program to weaknesses
>>that it itself cannot exploit.  That's why we consider it as
>>a supplement to other kinds of testing.
>
>no problem with this point of view. My point was: that self-testing alone will
>not work. in the same way test-suites alone do not show you anything.
>It is the whole stuff. Together. And still it is complicate enough and takes
>time. Enough.
>
>>- Don


We agree that self testing alone will not work.  I have heard lots
of theories about WHY self testing is horrible but I have never seen
this be much of a factor in practice.  What I DO see is that self
testing may not expose a problem that testing against a human or
a different authored program might.  But I have never seen it fail
to measure a program improvement.   For instance, let us say our
program has zero king safety.  We keep improving it and find that
it gets stronger and stronger, and yet having no king safety does
not seem to be a big weakness because neither program exploits
this fact.  You don't need to understand it if the opponent does
not understand it either!    But if I implement king safety in
one version, this program will not fail to exploit it in the
non-king safety version.  It will register immediately as an
improvement.

We play OTHER programs to find out what weaknesses we have.  The
self testing is great for what it does, but this is a better way
to seek out problems.   Each program we test against will exploit
different weaknesses.   Also we will tend to beat various
programs in characteristic ways.  Rebel seems to always win with
some passed pawn issue.  We have to beat Rebel in the middlegame
and be in great shape by the endgame or we will lose.   We tend
to outplay Crafty in the middlegame, as long as king safety is
not involved.  If we survive the king safety and Crafty survives
the middlegame, Crafty will tend to win more often than not.
Sometimes there are exceptions, but these themes occur over and
over and it's a different thing with each program, so we learn
a lot in this way.  You don't get this from self testing.

But the improvements we make in any of these areas will always
show up in self testing.  Self testing is highly underrated,
probably because someone decided long ago that it was bad and
it got repeated too many times!


- Don




This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.