Author: Christophe Theron
Date: 16:00:58 12/31/97
Go up one level in this thread
On December 28, 1997 at 23:38:12, Don Dailey wrote: >I did a really interesting study once several years ago. I took >a small problem set and adjusted the weights to predict the Swedish >ratings of several programs. You can use various methods to do >this, I used a genetic algorithm. I was able to come up with a >formula which was very accurate, within about 10 points for ANY >program that was involved in the test. > >This test should probably be repeated. It should involve as many >accurately rated programs as possible. The opening book hacks that >some programmers may be using could hurt the accuracy of this >test though since there is a possibility the book is the main >source of the programs strength. > >To be really accurate I think it's a mistake to only count total >problems solved. Time of solution should be a factor. Because >this is meaningful information that should not be thrown away. > >But it turns out this is the simplest thing to do, it's much harder >to construct a good scoring function for problem sets that take >time into account and allows you to not solve some problems. > >-- Don Did you ever heard about the Louguet Chess Test 2 (LCT-II) by french journalist/programmer Frederic Louguet? It is a set of 36 positions including "positional", "tactical" and "endgames" that you can use to measure the strength of any program. It takes the time used to solve each problem (max 10 minutes), uses a simple rating table, and gives you the "SSDF" ELO of the program. Surprisingly, it works very well, and gives generally the right ELO for each known program (on each know platform) within a 20 points margin. The french magazine "La Puce Echiquéenne" has published for several years results for many well known programs. The main advantage of this test is that you can have a pretty good idea of the strength of any program in less than 3 hours. New programs have been rated by the french revue months before the SSDF list mentionned them. I suppose Louguet used a statistical method to build the test (in fact to get the subset from a very large set of positions that gives the closest match to the SSDF ranking). LCT is accurate if you don't use it to improve your program. For example, I have found some changes that gave Tiger a near 2600 "ELO" (on PII-300MHz). But in games, this version is very weak. Christophe
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.