Author: Rolf Tueschen
Date: 03:48:38 06/21/04
Go up one level in this thread
On June 20, 2004 at 03:28:36, Cesar Contreras wrote: >I have a serius problem testing improvements in my chess engine. > >I use test suites to find bugs, and stablish comparisions between diferent >versions of my engine. I know that i can't use test suites to calculate ELO, >that's clear to me. > >But i think there can be a test that gives me an aproximation (with an aceptable >error margin) of the comparative strength of diferent versions of my engine with >diferent modifications. You write you think there can be a test ... but this is exactly what is wrong. There is none. >I don't plan to cheat, that's why i'm not going to try >to tune my engine to solve the test. > >I think i can make an analogy of IQ tests, there are several aspects evaluated >in IQ tests, and diferent intelligences. Or a psicological test, that evaluate a >lot of things about anybody. Both have an error margin and are affected to >cheats, enviroment, time, sickness, etc. But it's up to the person who runs the >test to try to avoid such problems. You write that you think that you can make an analogy to IQ tests ... but this is also wrong. > >That test could give not only strength comparision, but some info, like >performance in openning, middlegame, endgame, or maybe something more specific >like degree of care of pawn structure, movility, king safety. Yes, this is exactly what all the programmers are dreaming about, but it doesn't exist and if you think you could make such a test then you are misleaden. > >Maybe there can be diferent versions of the test, each one of them with diferent >number of positions, giving an error marging to each version. > >A good test suite specially oriented to chess programmers could be really great, >again, not looking for perfect results, but aproximations usefull for us, and >not only giving a number, but maybe several numbers of the evaluation of several >aspects. You write "not exact but approximately" but this is exactly the problem, how approximate that you can make a decision that is sound? > >is there any test that do that? >or is it just a dream? It is a dream. >if it is a dream, please share your method to test your engine after >modifications. The number of modifications we make to our engines is big, so the >time used in testing it's a really big, maybe 90% (not sure about it, what do >you say) Oh, they won't tell you this and that is the same as with their code. Nobody will tell you. > >There are several problems with making tournaments to get performance of the >engine: >- Select appropiate number of engines >- Selection the engines >- Number of games >- Gauntlet, round robin or swiss >- Time control. >And after making the tournament, how to know aproximate error margin of the >result? or the performance of the engine in several aspects (performance in >oppening, middle, endgame, care of positional aspects) You must realise that being a programmer of a new engine requires also talents in testing. That's the whole secret. You can try to improve your actual knowledge. Don't wait on publications with the solution of your problem. The good programmers have their own methods and the not so good will use existing test suites with less success in tournaments for sure. All the good programmers here have already stated that they don't work with test suites, not because they are dickheaded but because test suites don't "work". Finally let me inform you without being arrogant that there is no lack of good test methodology because test theory is a perfect field at universities but what you miss is a good test suite and the answer is - - - there is no such test suite because it's a contradiction in itself. You build up a chess engine for that it could play chess. But chess isn't just puzzle solving. Worse - - engines that are good at such test suites can well be weak in tournaments. I'm talkingt about engines whose playing strength is still unknown! The many testers around who run test suites with famous chess engines work with *already* perfectly tuned "well playing" tournament engines. The last paragraphe is a bit difficult to understand so you should ask if you have questions. Like all the readers of course. For the experts I want to say that this little difference above is for many computerchess fans the main reason to waste their precious time. They do really think that with their test suite teststing they can reveil something important which is nonsense because their testing would function at all if you, the experts, hadn't prepared your new engine versions so well. It would help a lot if you could give a couple of informations about the possible integration of testsuite solutions in your engines... Like SMK from SHREDDER declared "I could add a couple of test stuff so that nobody but me on this Earth could detect it". We must be thankful that at least sometimes someone is making some hints, but I doubt that they are fully understandable for lays. So they are surprised and satisfied that "their" test results did perfectly resemble the ones from tornament tests... Yes, what a surprise! ;)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.