Author: Don Dailey
Date: 20:38:12 12/28/97
Go up one level in this thread
On December 28, 1997 at 22:34:10, Fernando Villegas wrote: >Don: >Can you explain me why is not better -IF is not better-to test a program >or its differents versions of it just testing them with a real big set >of problems of tactical and positional nature, where best and second >best moves has been stablished? I presume that the program or version >that solve more and/or in less time, should be the better. Cannot be >otherwise. On the contrary, testing each of them againts the others let >the chance that a certain version that objetively, in chess standards, >is not the best, is at the same precisely the kind that defeats the real >best one. You know, as happens between human players, where sometimes >there is a black beast even for the best, nobody knows why. I mean, >maybe you can thrash precisely the big winner because you faced it with >his black beast. OK, I know you alsoo use this mnethod, but I was >thinking in all you said in the talk with Theron about this system of >doing hundreds of games and I recalled this weird phenomena of >"precisely" the black beast. I would be very afraid to lose something >real good because of that kiond of suden death method. >Fernando, just a customer with a great mouth. Hi Fernando, I use problems sets for debugging and 1st order testing of tactical improvements/extensions etc. I have a few sets I understand well, I know which algorithms each problem responds to and why. So they work well for debugging stuff. But so far I've never seen a great problem set that can accurately measure improvements to the program. It's almost as if a lot of algorithms that help problems hurt real play. But I believe a good set could be constructed but it would have to contain a lot of positions and all different kinds of them. There should be many positional problems in them too. But another problem is that problem sets are very difficult to construct. It often turns out that there are multiple solutions or there is some real question about whether a move really is best. When Larry Kaufman constucted sets for us he often threw out problems after a time, in short even his set evolved with time. But another problem with problem sets is that it doesn't measure how well your program integrates itself into the play. Your program may have many weaknesses that a problem set could uncover but your program may be very skillful at avoiding these weaknesses. This is similar to how humans play too. If I cannot handle a particular opening very well I simply avoid it! If someone tested me on this opening they might conclude I'm even weaker than I really am. The real issue with any kind of testing is how much information do you get back for your investment in time? If version A wins 1 game against version B you know almost nothing about there relative strength and only have a very weak hypothesis that A is stronger. But if they a closely matched, it could take a lot of games to determine who is the real favorite. So not too much information is bound to a single game. But you can think of a game as a problem set that is generated on the fly. The program that makes the best choices in this problem set will be the winner. But each position returns different information content. A completely obvious move tells you little about the program, any program will quickly find this move. But a really sophisticated move can tell you much more. How to separate out these positions? I do not know. -- Don I did a really interesting study once several years ago. I took a small problem set and adjusted the weights to predict the Swedish ratings of several programs. You can use various methods to do this, I used a genetic algorithm. I was able to come up with a formula which was very accurate, within about 10 points for ANY program that was involved in the test. This test should probably be repeated. It should involve as many accurately rated programs as possible. The opening book hacks that some programmers may be using could hurt the accuracy of this test though since there is a possibility the book is the main source of the programs strength. To be really accurate I think it's a mistake to only count total problems solved. Time of solution should be a factor. Because this is meaningful information that should not be thrown away. But it turns out this is the simplest thing to do, it's much harder to construct a good scoring function for problem sets that take time into account and allows you to not solve some problems. -- Don
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.