Author: Christophe Theron
Date: 01:47:28 09/07/98
Go up one level in this thread
On September 06, 1998 at 20:06:32, Don Dailey wrote:
>On September 06, 1998 at 18:54:20, Thorsten Czub wrote:
>
>>On September 06, 1998 at 16:55:48, Don Dailey wrote:
>>
>>>I think your method is pretty flawed if it depends very much on
>>>subjective analysis. For instance, if I did not like a program,
>>>I would be looking for problems and I would find them. I would
>>>dismiss the good moves and make excuses. I wouldn't do this
>>>on purpose, I would do it simply because I was human.
>>
>>We can bet that I am faster than your statistical approach and also more
>>precise.
>>my method has only one negative problem: i can only be exact with programs i
>>have a positive feeling/relationship.
>>
>>My method works like relationsships with human beeings.
>>You can only make a positive progress if your relation with the other human
>>beeings is ok. If the relation is shit, you cannot produce any good results.
>>
>>
>>>Your method should always be a part of the whole testing philosophy
>>>however. You can get into the trap of playing thousands of
>>>games and not seeing a single one. I think you must continue
>>>to do what you do, but other more objective test MUST also be
>>>performed.
>>
>>I do statistical stuff too. But it is not done to decide about strength, it is
>>only done to proof my prejudices or in more poositive saying, to proof my
>>thesis.
>>
>>I do not begin with RESULTS. I do only use it for proving the judgement.
>>
>>Very often the empirical data that is produced after a few months playing gives
>>the same judgement than my first impression with the program. This is the same
>>with human beeings. Of course there are a few human beeings that impress you,
>>and after a while you find out they are liars and assholes and they spent much
>>time of their lives to act as if they are nice friends.
>>Chess or humans. It is exactly the same for me.
>>Believing in friends, believing in programs. Exactly the same.
>>
>>
>>> And taking a liking to the way a program plays
>>>is just a whole lot different from knowing if it can get results.
>>>
>>>Self testing is an important part of the way we test. It is
>>>ideal for some things, not very good for others. We test all
>>>sorts of different ways and watch our program play to learn
>>>what is wrong. I don't think anyone has found the idea method
>>>for accurately measuring tiny program improvements.
>>
>>right. It is all methods together. But still, despite all the different methods,
>>i believe their is a non empirical, emotional way of doing it, that has also
>>exact results. But it is difficult to find out WHY it works.
>>
>>> If you
>>>give me a way to measure a 1 rating point improvement, I will
>>>write you a chess program that beats all the others with no
>>>question about who is strongest.
>>>
>>>Larry had a theory that self testing exaggerates improvements.
>>>If you make a small improvement, it will show up more with
>>>self testing. If this is true (we called it a theory, not a
>>>fact) then this is a wonderful way to test because small
>>>improvements are so hard to measure and this exaggerates them.
>>
>>it can also IMO lead you into a completely wrong direction.
>>when the new feature is only working against x and not against all other
>>programs y. Or against humans. You think you have an improvement, and in fact it
>>is only that it is improvement related to x.
>>
>>>We know that self testing won't expose our program to weaknesses
>>>that it itself cannot exploit. That's why we consider it as
>>>a supplement to other kinds of testing.
>>
>>no problem with this point of view. My point was: that self-testing alone will
>>not work. in the same way test-suites alone do not show you anything.
>>It is the whole stuff. Together. And still it is complicate enough and takes
>>time. Enough.
>>
>>>- Don
>
>
>We agree that self testing alone will not work. I have heard lots
>of theories about WHY self testing is horrible but I have never seen
>this be much of a factor in practice. What I DO see is that self
>testing may not expose a problem that testing against a human or
>a different authored program might. But I have never seen it fail
>to measure a program improvement. For instance, let us say our
>program has zero king safety. We keep improving it and find that
>it gets stronger and stronger, and yet having no king safety does
>not seem to be a big weakness because neither program exploits
>this fact. You don't need to understand it if the opponent does
>not understand it either! But if I implement king safety in
>one version, this program will not fail to exploit it in the
>non-king safety version. It will register immediately as an
>improvement.
>
>We play OTHER programs to find out what weaknesses we have. The
>self testing is great for what it does, but this is a better way
>to seek out problems. Each program we test against will exploit
>different weaknesses. Also we will tend to beat various
>programs in characteristic ways. Rebel seems to always win with
>some passed pawn issue. We have to beat Rebel in the middlegame
>and be in great shape by the endgame or we will lose. We tend
>to outplay Crafty in the middlegame, as long as king safety is
>not involved. If we survive the king safety and Crafty survives
>the middlegame, Crafty will tend to win more often than not.
>Sometimes there are exceptions, but these themes occur over and
>over and it's a different thing with each program, so we learn
>a lot in this way. You don't get this from self testing.
>
>But the improvements we make in any of these areas will always
>show up in self testing. Self testing is highly underrated,
>probably because someone decided long ago that it was bad and
>it got repeated too many times!
>
>
>- Don
You are right Don.
I would add that self testing excels in helping you to decide if a change you
are doing in the way you prune/extend the tree is good or not.
I would also say that I understand Thorsten's point of view, because I also see
situations where self test fails to give any useful information. This is often
the case when I change positional weights for example. I can create a "B"
version that beats the original "A" version by changing positional weights. Then
I can create another "C" version that beats "B". Then I try version "C" against
"A" and "A" wins. Ooops...
My conclusion is that self test is a tool you must have, but you have to be able
to identify situations where it does not work.
Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.