Author: Christophe Theron
Date: 01:47:28 09/07/98
Go up one level in this thread
On September 06, 1998 at 20:06:32, Don Dailey wrote: >On September 06, 1998 at 18:54:20, Thorsten Czub wrote: > >>On September 06, 1998 at 16:55:48, Don Dailey wrote: >> >>>I think your method is pretty flawed if it depends very much on >>>subjective analysis. For instance, if I did not like a program, >>>I would be looking for problems and I would find them. I would >>>dismiss the good moves and make excuses. I wouldn't do this >>>on purpose, I would do it simply because I was human. >> >>We can bet that I am faster than your statistical approach and also more >>precise. >>my method has only one negative problem: i can only be exact with programs i >>have a positive feeling/relationship. >> >>My method works like relationsships with human beeings. >>You can only make a positive progress if your relation with the other human >>beeings is ok. If the relation is shit, you cannot produce any good results. >> >> >>>Your method should always be a part of the whole testing philosophy >>>however. You can get into the trap of playing thousands of >>>games and not seeing a single one. I think you must continue >>>to do what you do, but other more objective test MUST also be >>>performed. >> >>I do statistical stuff too. But it is not done to decide about strength, it is >>only done to proof my prejudices or in more poositive saying, to proof my >>thesis. >> >>I do not begin with RESULTS. I do only use it for proving the judgement. >> >>Very often the empirical data that is produced after a few months playing gives >>the same judgement than my first impression with the program. This is the same >>with human beeings. Of course there are a few human beeings that impress you, >>and after a while you find out they are liars and assholes and they spent much >>time of their lives to act as if they are nice friends. >>Chess or humans. It is exactly the same for me. >>Believing in friends, believing in programs. Exactly the same. >> >> >>> And taking a liking to the way a program plays >>>is just a whole lot different from knowing if it can get results. >>> >>>Self testing is an important part of the way we test. It is >>>ideal for some things, not very good for others. We test all >>>sorts of different ways and watch our program play to learn >>>what is wrong. I don't think anyone has found the idea method >>>for accurately measuring tiny program improvements. >> >>right. It is all methods together. But still, despite all the different methods, >>i believe their is a non empirical, emotional way of doing it, that has also >>exact results. But it is difficult to find out WHY it works. >> >>> If you >>>give me a way to measure a 1 rating point improvement, I will >>>write you a chess program that beats all the others with no >>>question about who is strongest. >>> >>>Larry had a theory that self testing exaggerates improvements. >>>If you make a small improvement, it will show up more with >>>self testing. If this is true (we called it a theory, not a >>>fact) then this is a wonderful way to test because small >>>improvements are so hard to measure and this exaggerates them. >> >>it can also IMO lead you into a completely wrong direction. >>when the new feature is only working against x and not against all other >>programs y. Or against humans. You think you have an improvement, and in fact it >>is only that it is improvement related to x. >> >>>We know that self testing won't expose our program to weaknesses >>>that it itself cannot exploit. That's why we consider it as >>>a supplement to other kinds of testing. >> >>no problem with this point of view. My point was: that self-testing alone will >>not work. in the same way test-suites alone do not show you anything. >>It is the whole stuff. Together. And still it is complicate enough and takes >>time. Enough. >> >>>- Don > > >We agree that self testing alone will not work. I have heard lots >of theories about WHY self testing is horrible but I have never seen >this be much of a factor in practice. What I DO see is that self >testing may not expose a problem that testing against a human or >a different authored program might. But I have never seen it fail >to measure a program improvement. For instance, let us say our >program has zero king safety. We keep improving it and find that >it gets stronger and stronger, and yet having no king safety does >not seem to be a big weakness because neither program exploits >this fact. You don't need to understand it if the opponent does >not understand it either! But if I implement king safety in >one version, this program will not fail to exploit it in the >non-king safety version. It will register immediately as an >improvement. > >We play OTHER programs to find out what weaknesses we have. The >self testing is great for what it does, but this is a better way >to seek out problems. Each program we test against will exploit >different weaknesses. Also we will tend to beat various >programs in characteristic ways. Rebel seems to always win with >some passed pawn issue. We have to beat Rebel in the middlegame >and be in great shape by the endgame or we will lose. We tend >to outplay Crafty in the middlegame, as long as king safety is >not involved. If we survive the king safety and Crafty survives >the middlegame, Crafty will tend to win more often than not. >Sometimes there are exceptions, but these themes occur over and >over and it's a different thing with each program, so we learn >a lot in this way. You don't get this from self testing. > >But the improvements we make in any of these areas will always >show up in self testing. Self testing is highly underrated, >probably because someone decided long ago that it was bad and >it got repeated too many times! > > >- Don You are right Don. I would add that self testing excels in helping you to decide if a change you are doing in the way you prune/extend the tree is good or not. I would also say that I understand Thorsten's point of view, because I also see situations where self test fails to give any useful information. This is often the case when I change positional weights for example. I can create a "B" version that beats the original "A" version by changing positional weights. Then I can create another "C" version that beats "B". Then I try version "C" against "A" and "A" wins. Ooops... My conclusion is that self test is a tool you must have, but you have to be able to identify situations where it does not work. Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.