Author: Bill McGaugh
Date: 13:18:48 02/11/99
Go up one level in this thread
I don't know how other people do it, but I came up with a method based on the Elo system itself and using the results from the SSDF list. ----------------------------------------------------- Here is the idea: I assign Elo ratings to the problems themselves. To calibrate the problems I must run them at three minutes a move (because SSDF ratings are based on that time control) on a variety of programs. For example, let's say we have 10 different programs with an average rating on the SSDF list of 2450 (all on the same platform). If we run a particular problem on all 10 programs and all solve it, we throw it out. If we run it and all fail to solve it, we throw it out. If we find a problem were 5 programs solve and 5 fail (in three minutes), the program is given an Elo rating of 2450 (50%). If we find a problem that only one out of 10 programs can solve, it is given an Elo rating equal to winning 9 games out of 10 = 2450+358 (from Elo's tables)= 2808. Once we have found a calibrated a large number of numbers, we can use these problem ratings to rate programs. We compute the average difficulty rating for the suite of problems and then test a program by running it through the suite. If the suite rating is 2400 and the program gets 50% of the problems in the given time, then it has a rating of 2400...etc. I hope this makes sense. I started checking out the idea by taking some of the limited data from the CCR site...using the times for the Louguet 2 and the BT test, I arrived at the following, after throwing out the "irrelevent" problems: bt4- 2664 bt9- 2664 bt12-2254 bt24-2254 bt25-2388 bt26-2388 bt29-2521 bt30-2254 lgt3-2628 lgt4-2197 lgt5-2628 lgt6-2136 lgt7-2443 lgt8-2320 lgt9-2259 lgt10-2443 lgt11-2505 lgt12-2689 lgt13-2628 lgt17-2074 lgt20-2013 lgt21-2259 lgt22-2013 lgt23-2197 lgt24-2505 lgt29-2259 lgt30-2443 lgt31-2259 lgt32-2443 lgt33-2566 average rating for the suite= 2376.47 so a score of 15 out of 30 = 2376 1 out of 30 = 1838 (from Elo's percentage expectancy tables...rounded from the nearest percent...no interpolation between percents) 2 out of 30 = 1954 3 out of 30 = 2010 4 out of 30 = 2054 5 out of 30 = 2103 6 out of 30 = 2136 7 out of 30 = 2165 8 out of 30 = 2201 9 out of 30 = 2227 10 out of 30 = 2251 11 out of 30 = 2281 12 out of 30 = 2304 13 out of 30 = 2326 14 out of 30 = 2355 15 out of 30 = 2376 16 out of 30 = 2397 17 out of 30 = 2426 18 out of 30 = 2448 19 out of 30 = 2471 20 out of 30 = 2501 21 out of 30 = 2525 22 out of 30 = 2551 23 out of 30 = 2587 24 out of 30 = 2616 25 out of 30 = 2649 26 out of 30 = 2698 27 out of 30 = 2742 28 out of 30 = 2798 29 out of 30 = 2914 Testing a few programs on my P100 with the suite: Zarkov 4.5c = 2426 Mchess 7.1 = 2426 Hiarcs 6 = 2448 Rebel 8 = 2471 I also tried a little experiment on a K6-233, to study the effect of doubling time on rating (using Zark 4.5c): 10 seconds - 8 positions solved -2201 rating 20- 12 - 2304 40-16 - 2397 80-18 - 2448 160-18 - 2448 320-21 - 2525 640-22 - 2551 1280-24- 2616 (an average of 59.29 points per doubling...almost exactly what people have been saying for some time) I think that my rating system has potential, but what I need to do is based the problem ratings on a larger number of different programs over a broader range of ratings and assemble a large suite (100+) problems that are a nice combination of opening game, middle game, and endgame positions, combining both tactical and positional problems. ---------------------------------------------------- The notes above are from over a month ago. I'm continuing to work on building a test suite based on this method.
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.