Author: Don Dailey
Date: 12:08:00 12/29/97
Go up one level in this thread
On December 29, 1997 at 12:03:20, Robert Hyatt wrote: >On December 29, 1997 at 11:42:08, Don Dailey wrote: > >>On December 29, 1997 at 01:34:35, Bruce Moreland wrote: >> >>> >>>On December 28, 1997 at 23:38:12, Don Dailey wrote: >>> >>>>I did a really interesting study once several years ago. I took >>>>a small problem set and adjusted the weights to predict the Swedish >>>>ratings of several programs. You can use various methods to do >>>>this, I used a genetic algorithm. I was able to come up with a >>>>formula which was very accurate, within about 10 points for ANY >>>>program that was involved in the test. >>> >>>If you produced a formula that would accurately predict the Elo of 75% >>>of the known programs, would it accurately predict the Elo of the >>>remaining 25% without tweaking it? >>> >>>bruce >> >>That's a great question. This one could be tested without too much >>trouble if I were to repeat the test. >> >>I have a feeling the important thing is to start with as many programs >>as possible. If I tuned to 2 or 3 programs it would not predict >>well because it could take too many liberties to get those ratings >>just right. But if I started with many it might be "forced" to come >>up with realistic weights that reflected some kind of reality. >> >>But of course with more programs the procedure would probably not >>do as well with the worst case, it's unlikely I would get within >>10 rating points with all my initial testee's. >> >>-- Don > > >As I read this I chuckled internally, thinking of the test Larry Kaufman >wanted to try on Cray Blitz in Indianapolis at the ACM event. He had a >formula he was convinced was *very* accurate in matching a program's >results to it's "SSDF" equivalent rating. > >You probably remember the humerous result, where Cray Blitz solved >almost >all of the test positions in under a second, which totally blew his >formula >out the window, because CB was *not* a 2600+ program. > >I don't trust any formula that was "fit" to a known set of programs. >That's >a simple least-squares solution to fitting a polynomial to a known set >of >data points, and obviously it is accurate for the points along the >curve, >since that's how the curve was derived in the first place. But when you >toss in a new program that is *not* similar to the others (CSTal comes >to >mind) then this "formula" is not just wrong, but *badly* wrong. Ditto >for >any program that is somehow different from the programs used to produce >the >formula... > >The basic flaw here is called "statistical inbreeding"... Larry's test did not use my procedure, also I'm not saying my procedure has any value, it was just an idea. I think the biggest flaw in the tests is simply the problem selection process. If a human took Larry's test I'll bet he would perform quite poorly due to your "statistical inbreeding" phenomenon. But that's why I believe a good test should involve a large and varied sample. Probably even humans should be timed in their solution times. But I don't think human times coorelated very well with computer times, humans tend to take longer on easy problems and shorter on difficult ones. Humans use a very flexible selective search algorithm that can adapt in wonderful ways. So I have a feeling it would be virtually impossible to come up with a good set for humans and computers! I have a feeling that in some sense the problem set was really testing something else fairly accurately and your program deserved the rating it got! We just should not considering it a "chess rating" but something else. -- Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.