Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Junior's long lines: more data about this....

Author: Chris Whittington

Date: 13:03:23 12/29/97

Go up one level in this thread



On December 29, 1997 at 15:08:00, Don Dailey wrote:

>On December 29, 1997 at 12:03:20, Robert Hyatt wrote:
>
>>On December 29, 1997 at 11:42:08, Don Dailey wrote:
>>
>>>On December 29, 1997 at 01:34:35, Bruce Moreland wrote:
>>>
>>>>
>>>>On December 28, 1997 at 23:38:12, Don Dailey wrote:
>>>>
>>>>>I did a really interesting study once several years ago.  I took
>>>>>a small problem set and adjusted the weights to predict the Swedish
>>>>>ratings of several programs.  You can use various methods to do
>>>>>this, I used a genetic algorithm.  I was able to come up with a
>>>>>formula which was very accurate, within about 10 points for ANY
>>>>>program that was involved in the test.
>>>>
>>>>If you produced a formula that would accurately predict the Elo of 75%
>>>>of the known programs, would it accurately predict the Elo of the
>>>>remaining 25% without tweaking it?
>>>>
>>>>bruce
>>>
>>>That's a great question.  This one could be tested without too much
>>>trouble if I were to repeat the test.
>>>
>>>I have a feeling the important thing is to start with as many programs
>>>as possible.   If I tuned to 2 or 3 programs it would not predict
>>>well because it could take too many liberties to get those ratings
>>>just right.   But if I started with many it might be "forced" to come
>>>up with realistic weights that reflected some kind of reality.
>>>
>>>But of course with more programs the procedure would probably not
>>>do as well with the worst case, it's unlikely I would get within
>>>10 rating points with all my initial testee's.
>>>
>>>-- Don
>>
>>
>>As I read this I chuckled internally, thinking of the test Larry Kaufman
>>wanted to try on Cray Blitz in Indianapolis at the ACM event.  He had a
>>formula he was convinced was *very* accurate in matching a program's
>>results to it's "SSDF" equivalent rating.
>>
>>You probably remember the humerous result, where Cray Blitz solved
>>almost
>>all of the test positions in under a second, which totally blew his
>>formula
>>out the window, because CB was *not* a 2600+ program.
>>
>>I don't trust any formula that was "fit" to a known set of programs.
>>That's
>>a simple least-squares solution to fitting a polynomial to a known set
>>of
>>data points, and obviously it is accurate for the points along the
>>curve,
>>since that's how the curve was derived in the first place.  But when you
>>toss in a new program that is *not* similar to the others (CSTal comes
>>to
>>mind) then this "formula" is not just wrong, but *badly* wrong.  Ditto
>>for
>>any program that is somehow different from the programs used to produce
>>the
>>formula...
>>
>>The basic flaw here is called "statistical inbreeding"...
>
>
>Larry's test did not use my procedure, also I'm not saying my procedure
>has any value, it was just an idea.
>
>I think the biggest flaw in the tests is simply the problem selection
>process.  If a human took  Larry's test I'll bet he would perform
>quite poorly due to your "statistical inbreeding" phenomenon.   But
>that's why I believe a good test should involve a large and varied
>sample.

But isn't the real flaw with tests that they test for finding solutions,
but now how to get into those positions in the first place. And which
posiitons steered to is very subjective. Tal would steer different to
Tarrasch ....

Chris Whittington


>Probably even humans should be timed in their solution times.
>But I don't think human times coorelated very well with computer times,
>humans tend to take longer on easy problems and shorter on difficult
>ones.  Humans use a very flexible selective search algorithm that
>can adapt in wonderful ways.   So I have a feeling it would be virtually
>impossible to come up with a good set for humans and computers!
>
>I have a feeling that in some sense the problem set was really testing
>something else fairly accurately and your program deserved the rating
>it got!   We just should not considering it a "chess rating" but
>something else.
>
>-- Don



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.