Author: Bruce Moreland
Date: 15:55:12 09/27/99
Go up one level in this thread
On September 27, 1999 at 14:33:34, Georg v. Zimmermann wrote: >I'm not sure whether i completely understand your post. Thx for the crafty-tip >btw. > >I remember that one of the positional LCTII positions is solved by doctor3 in 5 >seconds, and never solved by fritz, one is solved by hiarcs, but not by the >others. >These positional moves are clearly (for a strong human) the best ones in the >given position. Doesn't this test chess-knowledge, not speed ? Here is my point. Let's say that you play 10,000 round robin tournaments between programs A, B, C, and D, and you get the following estimated ratings: A 2400 B 2425 C 2450 D 2475 I can build a test that will allow you to predict the ratings of these programs. The test is: Elo = 2375 + N * 25 Where N is the ordinal value of the name of the program (A=1, B=2, C=3, D=4), the "%" is the modulo operator, and integer math is used. This formula predicts the ratings of each program with absolute accuracy. If you do this test at home, you will be able to predict the ratings yourself, and it will always work. Assume this test is done on a 200 mhz computer, and you guess that doubling speed is worth 100 Elo points. Then the formula can be modified as follows: Elo = 2275 + N * 25 + log2(mhz / 200) * 100 This will produce the same values as before, only it will also accurately predict the rating if you increase processor speed. You can't argue with this test, it is a perfect predictor for these four programs on any computer, assuming that 100 points per doubling holds up. Why won't anyone agree that this is a good test? Because the test has no relevance to the strength of the programs. I couldn't get anyone to agree that what you call your program matters enough that it affects the rating like this. But look at these suite tests. They are also created using a fixed set of programs, on a particular processor, and calibrated to a scale that has been predetermined (typically the SSDF list, or someone's feelings about what the SSDF list should really show). Do you think that this isn't done? I can't imagine someone just doing a test and picking a formula at random and magically the right Elo numbers come out. No, the test and the formula are both calibrated against some predefined reality. When you run one of these suites at home, on one of the programs that the test was calibrated with, you are just replaying the number that the suite author determined that the program should get. It's not predicting anything, the whole test is a recording of predetermined "facts", for many of the most popular programs. There is some question about whether the suite can be accurate for the programs that it wasn't calibrated with, but since most of these probably use the most popular programs to calibrate, you can't really compare anything. If one of the calibration programs comes back 2475, and some new program comes back with 2450, who is to say that that the new program is really weaker, since the test has been fiddled with until it produces 2475 for the first program? bruce
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.