Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Junior's long lines: more data about this....

Author: Don Dailey

Date: 12:08:00 12/29/97

Go up one level in this thread


On December 29, 1997 at 12:03:20, Robert Hyatt wrote:

>On December 29, 1997 at 11:42:08, Don Dailey wrote:
>
>>On December 29, 1997 at 01:34:35, Bruce Moreland wrote:
>>
>>>
>>>On December 28, 1997 at 23:38:12, Don Dailey wrote:
>>>
>>>>I did a really interesting study once several years ago.  I took
>>>>a small problem set and adjusted the weights to predict the Swedish
>>>>ratings of several programs.  You can use various methods to do
>>>>this, I used a genetic algorithm.  I was able to come up with a
>>>>formula which was very accurate, within about 10 points for ANY
>>>>program that was involved in the test.
>>>
>>>If you produced a formula that would accurately predict the Elo of 75%
>>>of the known programs, would it accurately predict the Elo of the
>>>remaining 25% without tweaking it?
>>>
>>>bruce
>>
>>That's a great question.  This one could be tested without too much
>>trouble if I were to repeat the test.
>>
>>I have a feeling the important thing is to start with as many programs
>>as possible.   If I tuned to 2 or 3 programs it would not predict
>>well because it could take too many liberties to get those ratings
>>just right.   But if I started with many it might be "forced" to come
>>up with realistic weights that reflected some kind of reality.
>>
>>But of course with more programs the procedure would probably not
>>do as well with the worst case, it's unlikely I would get within
>>10 rating points with all my initial testee's.
>>
>>-- Don
>
>
>As I read this I chuckled internally, thinking of the test Larry Kaufman
>wanted to try on Cray Blitz in Indianapolis at the ACM event.  He had a
>formula he was convinced was *very* accurate in matching a program's
>results to it's "SSDF" equivalent rating.
>
>You probably remember the humerous result, where Cray Blitz solved
>almost
>all of the test positions in under a second, which totally blew his
>formula
>out the window, because CB was *not* a 2600+ program.
>
>I don't trust any formula that was "fit" to a known set of programs.
>That's
>a simple least-squares solution to fitting a polynomial to a known set
>of
>data points, and obviously it is accurate for the points along the
>curve,
>since that's how the curve was derived in the first place.  But when you
>toss in a new program that is *not* similar to the others (CSTal comes
>to
>mind) then this "formula" is not just wrong, but *badly* wrong.  Ditto
>for
>any program that is somehow different from the programs used to produce
>the
>formula...
>
>The basic flaw here is called "statistical inbreeding"...


Larry's test did not use my procedure, also I'm not saying my procedure
has any value, it was just an idea.

I think the biggest flaw in the tests is simply the problem selection
process.  If a human took  Larry's test I'll bet he would perform
quite poorly due to your "statistical inbreeding" phenomenon.   But
that's why I believe a good test should involve a large and varied
sample.  Probably even humans should be timed in their solution times.
But I don't think human times coorelated very well with computer times,
humans tend to take longer on easy problems and shorter on difficult
ones.  Humans use a very flexible selective search algorithm that
can adapt in wonderful ways.   So I have a feeling it would be virtually
impossible to come up with a good set for humans and computers!

I have a feeling that in some sense the problem set was really testing
something else fairly accurately and your program deserved the rating
it got!   We just should not considering it a "chess rating" but
something else.

-- Don










This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.