Author: Don Dailey
Date: 23:24:09 12/31/97
Go up one level in this thread
On December 31, 1997 at 19:00:58, Christophe Theron wrote: > >On December 28, 1997 at 23:38:12, Don Dailey wrote: > >>I did a really interesting study once several years ago. I took >>a small problem set and adjusted the weights to predict the Swedish >>ratings of several programs. You can use various methods to do >>this, I used a genetic algorithm. I was able to come up with a >>formula which was very accurate, within about 10 points for ANY >>program that was involved in the test. >> >>This test should probably be repeated. It should involve as many >>accurately rated programs as possible. The opening book hacks that >>some programmers may be using could hurt the accuracy of this >>test though since there is a possibility the book is the main >>source of the programs strength. >> >>To be really accurate I think it's a mistake to only count total >>problems solved. Time of solution should be a factor. Because >>this is meaningful information that should not be thrown away. >> >>But it turns out this is the simplest thing to do, it's much harder >>to construct a good scoring function for problem sets that take >>time into account and allows you to not solve some problems. >> >>-- Don > >Did you ever heard about the Louguet Chess Test 2 (LCT-II) by french >journalist/programmer Frederic Louguet? It is a set of 36 positions >including "positional", "tactical" and "endgames" that you can use to >measure the strength of any program. It takes the time used to solve >each problem (max 10 minutes), uses a simple rating table, and gives you >the "SSDF" ELO of the program. Surprisingly, it works very well, and >gives generally the right ELO for each known program (on each know >platform) within a 20 points margin. > >The french magazine "La Puce Echiquéenne" has published for several >years results for many well known programs. The main advantage of this >test is that you can have a pretty good idea of the strength of any >program in less than 3 hours. New programs have been rated by the french >revue months before the SSDF list mentionned them. > >I suppose Louguet used a statistical method to build the test (in fact >to get the subset from a very large set of positions that gives the >closest match to the SSDF ranking). > >LCT is accurate if you don't use it to improve your program. For >example, I have found some changes that gave Tiger a near 2600 "ELO" (on >PII-300MHz). But in games, this version is very weak. > > > Christophe I'm very interested in getting this set. If I do I will not pay any attention to them and will not try to understand them, just run them! Can you tell me where to get them? -- Don I did another intersting test once. I took a randomized database of positions with master moves and noted the master responses. I used a huge sample of about 20 thousand positions. I tested on 2, 3, 4, 5, etc plys just to see how often Socrates matched the master move. I found a very nice smooth improvement with depth. I thought finally, maybe this is a decent way to measure improvement! I would get 100's more problems on each level jump. So then I decided to turn off all the big pawn structure stuff and try the test. I self tested thoroughly to verify that pawn structure was indeed a MAJOR source of strength in Socrates, it was worth perhaps 100 rating points or more. The results at a given depth came out virtually the same! I was completely baffled. I didn't check into this too much further but my hypothesis now is that there is no concept of "weighting" here. Not playing a master move is not the same as making a horrible pawn structure error and this test gives them the same weight. -- Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.