Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: ELO vs. USCF

Author: Don Dailey
Date: 19:54:02 01/03/98
On January 03, 1998 at 21:15:20, Robert Hyatt wrote:

>On January 03, 1998 at 18:14:53, Don Dailey wrote:
>
>>On January 02, 1998 at 21:59:51, Robert Hyatt wrote:
>>
>>>On January 02, 1998 at 13:53:17, Stuart Cracraft wrote:
>>>
>>>>Is there a formula for translating ELO to USCF rating?
>>>>
>>>>I've heard that at some levels it is a 100+ difference
>>>>on the USCF side but that it varies, lower differences
>>>>for higher ratings.
>>>>
>>>>Anyway, this is to conver the ELO 2040 rating of a Louguet II
>>>>test result to USCF.
>>>>
>>>>Thanks,
>>>>Stuart
>>>
>>>first, this premise is totally wrong.  Ken Sloan posted to r.g.c.c
>>>last year analyzing the difference in FIDE ratings and USCF ratings.
>>>The "average" is less than 50 points with USCF being higher than
>>>FIDE, but for the upper end I seem to recall that 30 was the right
>>>"fudge".
>>>
>>>Second, forget taking a test suite, running it, and getting an Elo
>>>(FIDE)
>>>type rating.  It ain't going to happen.  You won't get anything anywhere
>>>close to the true rating of the program.
>>
>>Bob,
>>
>>I don't think this is a ridiculous idea, there are just a lot of
>>problems
>>that must be solved first before it can be done correctly.  Just because
>>it hasn't been done well yet doesn't mean it cannot be!
>>
>>I seem to remember you are right about the Elo points.  I think at one
>>time there was a much larger difference but some gradual adjustments
>>have been taking place over the years.
>>
>>
>>-- Don
>
>I'd agree that it *can* be done, but it hasn't, and likely won't for a
>long
>while.  The problem is explained as follows:
>
>Humans and computers are different.  Humans have a mixture of tactical
>and
>positional skills, that blend together.  It is possible that you might
>solve
>something in one way one time, and find something different the next.
>On
>the other hand, computers are *very* specific in their search strategies
>and
>their evaluations...  and they apply things the same way every time.  So
>a
>program either "gets it or it doesn't get it"...  And it's knowledge can
>be very narrow (a tactical searcher/finder like Fritz).  Which is so
>unlike
>what humans do that comparing the two is quite difficult.
>
>A good test is to take a human and give him several well-known problem
>suites and he will do similarly on most or all of them, while a computer
>will be "all over the place"... killing the tactical ones, failing
>miserably
>on the positional ones, and even failing on some tactical ones...
>
>Fitting a formula to make the results from a suite match several
>programs
>doesn't work either...  for obvious reasons...

It may take a while I agree.   I see 2 issues here:

  1.  Measuring ELO with a test.

  2.  Measuring improvement with a test.

They are both tough problems that won't be easily solved.  I think a
reasonably good test could be done if it the problems fit most of
these criteria:

   1)  Lots of positions, perhpas hundreds.

   2)  No ambiguious answers - like you say it must not flounder
       around once it "get's it."

   3)  But point 3 should not just degrade to positional tactics either.

   4)  More positional problems than tactical.

   5)  Problems are weighted

   6)  Question mark problems important (Don't play this move)

   7)  The programmers must never see this set!


But a serious problem would be how to choose these problems correctly
and get the weights right.   If someone really understood what made
1 program better than another (other that the obvious generalizations),
that person could eventually construct a really good set.

A great set could get close to an accurate rating but would still be
subject to a few more problems that sets can't measure like opening
books, time control algorithms and a bunch of intangibles that can
make a difference (like one program cutting the time a little closer
than another, better overtime think, better obvious move algorithms,
optimization for a programs own particular style etc.   But I believe
these will tend to average out for most programs and this test certainly
would underate some and overate others.   But still, "the play's the
thing", if it measured chess skill well it would do ok.

Requirement 7 would not be honored though since programmers would love
to have such a set if it were pretty good at measuring relatively
small improvements.

I do think that if the set were large enough and you optimized your
program to do well on it, you couldn't help but improve your program.

I have to admit constructing this set seems impossible.  But I believe
an omniscient being could do it!  In other words I believe this set
exists.


-- Don
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.