Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: mclane's summer-tournament: round 6 update

Author: Christophe Theron

Date: 01:47:28 09/07/98

Go up one level in this thread


On September 06, 1998 at 20:06:32, Don Dailey wrote:

>On September 06, 1998 at 18:54:20, Thorsten Czub wrote:
>
>>On September 06, 1998 at 16:55:48, Don Dailey wrote:
>>
>>>I think your method is pretty flawed if it depends very much on
>>>subjective analysis.  For instance, if I did not like a program,
>>>I would be looking for problems and I would find them.  I would
>>>dismiss the good moves and make excuses.  I wouldn't do this
>>>on purpose, I would do it simply because I was human.
>>
>>We can bet that I am faster than your statistical approach and also more
>>precise.
>>my method has only one negative problem: i can only be exact with programs i
>>have a positive feeling/relationship.
>>
>>My method works like relationsships with human beeings.
>>You can only make a positive progress if your relation with the other human
>>beeings is ok. If the relation is shit, you cannot produce any good results.
>>
>>
>>>Your method should always be a part of the whole testing philosophy
>>>however.  You can get into the trap of playing thousands of
>>>games and not seeing a single one.  I think you must continue
>>>to do what you do, but other more objective test MUST also be
>>>performed.
>>
>>I do statistical stuff too. But it is not done to decide about strength, it is
>>only done to proof my prejudices or in more poositive saying, to proof my
>>thesis.
>>
>>I do not begin with RESULTS. I do only use it for proving the judgement.
>>
>>Very often the empirical data that is produced after a few months playing gives
>>the same judgement than my first impression with the program. This is the same
>>with human beeings. Of course there are a few human beeings that impress you,
>>and after a while you find out they are liars and assholes and they spent much
>>time of their lives to act as if they are nice friends.
>>Chess or humans. It is exactly the same for me.
>>Believing in friends, believing in programs. Exactly the same.
>>
>>
>>> And taking a liking to the way a program plays
>>>is just a whole lot different from knowing if it can get results.
>>>
>>>Self testing is an important part of the way we test.  It is
>>>ideal for some things, not very good for others.  We test all
>>>sorts of different ways and watch our program play to learn
>>>what is wrong.   I don't think anyone has found the idea method
>>>for accurately measuring tiny program improvements.
>>
>>right. It is all methods together. But still, despite all the different methods,
>>i believe their is a non empirical, emotional way of doing it, that has also
>>exact results. But it is difficult to find out WHY it works.
>>
>>> If you
>>>give me a way to measure a 1 rating point improvement, I will
>>>write you a chess program that beats all the others with no
>>>question about who is strongest.
>>>
>>>Larry had a theory that self testing exaggerates improvements.
>>>If you make a small improvement, it will show up more with
>>>self testing.  If this is true (we called it a theory, not a
>>>fact) then this is a wonderful way to test because small
>>>improvements are so hard to measure and this exaggerates them.
>>
>>it can also IMO lead you into a completely wrong direction.
>>when the new feature is only working against x and not against all other
>>programs y. Or against humans. You think you have an improvement, and in fact it
>>is only that it is improvement related to x.
>>
>>>We know that self testing won't expose our program to weaknesses
>>>that it itself cannot exploit.  That's why we consider it as
>>>a supplement to other kinds of testing.
>>
>>no problem with this point of view. My point was: that self-testing alone will
>>not work. in the same way test-suites alone do not show you anything.
>>It is the whole stuff. Together. And still it is complicate enough and takes
>>time. Enough.
>>
>>>- Don
>
>
>We agree that self testing alone will not work.  I have heard lots
>of theories about WHY self testing is horrible but I have never seen
>this be much of a factor in practice.  What I DO see is that self
>testing may not expose a problem that testing against a human or
>a different authored program might.  But I have never seen it fail
>to measure a program improvement.   For instance, let us say our
>program has zero king safety.  We keep improving it and find that
>it gets stronger and stronger, and yet having no king safety does
>not seem to be a big weakness because neither program exploits
>this fact.  You don't need to understand it if the opponent does
>not understand it either!    But if I implement king safety in
>one version, this program will not fail to exploit it in the
>non-king safety version.  It will register immediately as an
>improvement.
>
>We play OTHER programs to find out what weaknesses we have.  The
>self testing is great for what it does, but this is a better way
>to seek out problems.   Each program we test against will exploit
>different weaknesses.   Also we will tend to beat various
>programs in characteristic ways.  Rebel seems to always win with
>some passed pawn issue.  We have to beat Rebel in the middlegame
>and be in great shape by the endgame or we will lose.   We tend
>to outplay Crafty in the middlegame, as long as king safety is
>not involved.  If we survive the king safety and Crafty survives
>the middlegame, Crafty will tend to win more often than not.
>Sometimes there are exceptions, but these themes occur over and
>over and it's a different thing with each program, so we learn
>a lot in this way.  You don't get this from self testing.
>
>But the improvements we make in any of these areas will always
>show up in self testing.  Self testing is highly underrated,
>probably because someone decided long ago that it was bad and
>it got repeated too many times!
>
>
>- Don

You are right Don.

I would add that self testing excels in helping you to decide if a change you
are doing in the way you prune/extend the tree is good or not.

I would also say that I understand Thorsten's point of view, because I also see
situations where self test fails to give any useful information. This is often
the case when I change positional weights for example. I can create a "B"
version that beats the original "A" version by changing positional weights. Then
I can create another "C" version that beats "B". Then I try version "C" against
"A" and "A" wins. Ooops...

My conclusion is that self test is a tool you must have, but you have to be able
to identify situations where it does not work.


    Christophe



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.