Author: Dave Gomboc
Date: 22:45:33 04/27/04
Go up one level in this thread
On April 27, 2004 at 18:06:17, Dann Corbit wrote: >On April 27, 2004 at 17:57:15, Eric Oldre wrote: > >>My engine (murderhole) has gotten to the point where it's not easy for me to >>tell if a given change has helped or hurt more. So I need to come up with a >>better, more quantitative way of testing. >> >>Of course I know that the only sure test is lots and lots of games. but i just >>don't have the patience, and the results can vary so much. >> >>my idea, and i'm sure you all have thought of something similar, but probably >>better (that's why i'm posting) is: >> >>1) create a series of test positions probably small at first (50-100) but would >>need to grow later. >> >>2) generate a list of all possible moves from each test position and the >>resulting position after the move. >> >>3) have some strong program generate scores for each resulting position, and >>therefore a score for the preceeding move. >> >>4) then i could run my program against each position and see how often it picked >>the best, 2nd best, 3rd best, etc. >> >>I could store all the test positions and scores in an XML file perhaps. >> >>The only problem is that setting this up would be a pain and spare time is not >>something I have lots of these days (like all of us i'm sure) so if i can avoid >>some work i'm all for it.. I was hoping someone might have some files for >>configuring this stuff publicly available, or maybe it is even a feature of some >>commercial program that i don't know of. >> >>Even if only pieces of this are out there it could help. Or something similar. >> >>Any ideas? > >Test suites do not work for this purpose. They are good for judging tactical >strength, but poor estimators of game strength. If you optimze for tactics, >then the program will play poorly. I can generate a large boost in the tactical >strength of Beowulf by tuning using test suites. Then it gets murdered in >actual games. > >It may be that quiet moves could be a good indicator. Yes, I agree that typical test sets fail to provide an appropriate balance of positions from which one can work on improving their program for game conditions. Disclaimer: I am not a chess program author. >Another possibility is to look at what Dave Gomboc did in his thesis. >Seems like it would require lots of hardware, though. Mmm, well, lots of hardware helps :-) but also a big help is to be in full control of the software you're testing. Theoretically I could change Crafty as much as I liked, but in practice my experiments wouldn't have been valid if I broke something in it accidentally, and besides, if there was extensive change then Crafty would no longer be Crafty. I was very careful about what I changed, and almost always ended up doing things in slow ways using a relatively small interface to Crafty's functionality that I was sure would work as opposed to performing invasive code surgery, because my purpose was researching a novel technique, not creating a highly efficient implementation. There were even orders-of-magnitude speed-ups I didn't implement but described in the future work section. I'm sure that someone who knows their chess program well could get much better performance using my tuning method than I did with Crafty. Nonetheless, even if people don't adopt the technique I propose, if they read my thesis and come away with improved understanding, I'm happy. The thesis URL has changed (I glued the front and the main part together :-) but it's still available from http://www.cs.ualberta.ca/~dave. Dave
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.