Author: Bruce Moreland
Date: 13:25:10 02/11/99
Go up one level in this thread
On February 11, 1999 at 15:41:25, Dann Corbit wrote: >Andreas Schwartmann asked an interesting question in r.g.c.c.: >"I wonder if anyone can enlighten me on how to use various test suites, like >LCT, LCT II and Covax. There are ceratin formulas on how to calculate the >playing strength according to these test suites, right?" > >Now, ignoring the fact that they are full of bugs and the measures are probably >bogus, how *does* one arrive at an ELO from a test suite evaluation? > >What is the actual mathematical basis for the calculations? You come up with a formula that turns the times into an Elo rating, then check against a reference set of programs, and if there is not a good match, go back to the beginning of this sentence. The test becomes a very good predictor for those programs, which is obviously no big deal, since the formula has been constructed after the test has been run, and is *designed* to predict well for those programs. If you wanted, you could predict Elo rating based upon the letters in the program's name, and you'd also get a good predictor. The question is whether the suites are measuring something that has to do with chess strength. It makes sense that there is at least some connection, since the problems are typically middlegame tactical or positional problems, and everybody knows that tactical and positional speed are components of strength. So for non-reference programs, perhaps they are comparing something that is grossly related to chess strength. But you have to keep in mind that for the reference programs, the scores produced are the scores that the suite builder wanted to produce. It's a bad trap to assume that someone's BS2830 scores back up their SSDF rating, if the BS2830 suite was calibrated using SSDF ratings as inputs. I would never trust Elo numbers produced by a test suite. I think it makes more sense to give the scores in a way that keeps them from looking like Elo ratings, so there wouldn't be the tendency to use the scores as Elo ratings, and the scoring formula could be less complex, too. Sorry that this post is somewhat scattered, there is an angry kid in the next room. bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.