Author: Charles Roberson
Date: 07:28:47 03/04/06
Go up one level in this thread
Since you haven't had a response, I'll respond. Proabably the best answer is do it yourself. Easily done or hard to do: take your pick. 1) grab several suites and run several engines (32 should do) through it. 2) the engines should be quite varied in strength (very weak to very strong) 3) Now the easy or hard part: a) easy: compare the results to a various rating lists. (WBEC, Gunther Simon's, George Lyapko's, Oliver Deville....) each has various time controls (so they did the hard part for you) b) hard: if you don't do (a) then do the work of (a) by running lots of tournaments with various time controls using the programs that you are testing and more if you can. If you go with steps 1,2, and 3a, you should be able to do the work reasonably quick. Make sure you a decent mathematical method for correlation testing of your results. You could look for several correlations: 1) Correlation between rankings: ratings list order versus order of suite performance. 2) Rating Correlation: is there a solid correlation between exact rating and number of problems solved. There are others, but remember that you should run the correlation tests with each different time control. This is good fundamental research that you will be doing. Likely answer will be that most test suites are not good indicators. Interesting answer is that a test suite exists that does correlate to tournament performance. If you don't know how to do correlation tests, you can find the equations in an elementary statistics book. Pearson's R should be a good coefficient to use. Maybe you'll find that only the weak engines correlate with the data or only the middle engines or only the top ones. This seems problematic -- engines can have low ratings due to bad search, bad position evaluator, any combination of that. Thus, a positional suite versus a tactical suite could produce different results. The best suite should have a combination of positional and tactical. So you could make up test suites. Use them by themselves and make up new suites based on combinations of other suites. Theoritically, it should be possible to put a suite together that predicts performance. Because, most of us test performance via running lots of matches. This is effectively a suite, just a very dynamic one. Effectively, this is remotly similar to the idea of position learning. Position learning will learn a position by storing it in a database when the program produces the incorrect eval for the position in a real match. The next time the engine sees the position in a search it uses the experience value in the database instead of the search value. If you create a suite of all the positions in the position learned database and improve the program so that it does better on that suite of troublesome positions without degredating performance in other areas then the program has improved.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.