Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: A theory of ratings drift for the SSDF

Author: Robert Hyatt

Date: 11:20:51 04/11/02

Go up one level in this thread


On April 11, 2002 at 12:09:25, Sune Fischer wrote:

>On April 11, 2002 at 09:48:13, Robert Hyatt wrote:
>
>>On April 10, 2002 at 17:58:02, Sune Fischer wrote:
>>
>>>On April 10, 2002 at 16:13:23, Robert Hyatt wrote:
>>>
>>>>On April 09, 2002 at 16:04:01, Dann Corbit wrote:
>>>>
>>>>
>>>>one quick note.  You are falling into the same "trap" that 99% of the
>>>>people here fall into... treating the "rating" as "absolute".  It is not.
>>>>You should compare the rating of (say) 1996 chessmaster to 1996 genius,
>>>>then compare the 2002 ratings for both and see if the "spread" has
>>>>changed much.  If it has, something is wrong.  If it has not, then the
>>>>Elo system is working perfectly...
>>>>
>>>>The absolute rating probably should drop since new and more skilled players
>>>>are entering the "pool" each year...  But the spread between two programs
>>>>should not change significantly...
>>>
>>>Why would the spread change if they still use the same formula?
>>
>>
>>Because the _pool_ has changed.  The "new" programs will _necessarily_ be
>>stronger than the old.  And with Elo, there is a "conservation of rating pool"
>>built in...  both players get their ratings adjusted by the same amount, but
>>with "opposite sign".
>
>I can only agree with you that the old programs would get pushed down by the
>newer and stronger programs, but this the about the average dropping for the old
>programs, not about the spread.


Perhaps I missed something along the way, because I didn't see any "spread"
change at all...  although it is certainly possible that the spread can
change as stronger players enter the pool.  because the spread is a statistical
prediction between two players in the pool, but it is an average for _all_ the
players.  If you change the pool of players in any way, the average rating can
change and the spread can change.  The former more than the latter of course.

>
>>But the spread must necessarily stay the same for two players that have a
>>constant probability of beating each other.  Although you could add 1000 to
>>every pool player's rating and things would continue to work just fine.  In
>>fact, you can adjust everyone's rating by a single constant without changing
>>a thing.  The statistics still work just fine...
>
>If the "spread" is the same for any two players, how can the spread then change
>at all? :)

As I said, the spread should _not_ change.  But with new and stronger players
in the pool, they are going to "squash" everyone else down, while they climb,
since the points within the pool remains constant more or less...


>
>>
>>
>>>The difference in elo between players is just related to the win/lose ratio
>>>between them, so the spread should stay fixed if the win/lose ratio remains the
>>>same.
>>>
>>>Of cause the scale could drift up or down, but since programs perform at a
>>>constant level, we do have a tool to correct for that.
>>>As I suggested in a different post, one could simply take a group of programs,
>>>find their average and make sure that average remains constant.
>>>It would be far better with a large group than just one or two programs, much
>>>smaller errorbars on the "absoluteness" of the scale.
>>
>>Finding the "average" is statistically invalid.  Elo's formula is _only_
>>interested in the difference in rating between two players.  The absolute
>>value doesn't mean a thing.
>
>It doesn't mean a thing _now_. But it could be absolute by the adjustment I
>suggested :)

It will _never_ mean anything.  The absolute rating is meaningless and is
totally arbitrary in the first place.  Just start everyone at 10000 elo and
things _still_ work perfectly.  Trying to normalize between two pools is
not easy.  Trying to normalize between more than two pools is impossible...
And the SSDF represents dozens of pools since new players are added each
year...




>
>
>> The average of these values also doesn't mean a
>>thing...  This is why you should _never_ try to equate SSDF ratings to FIDE
>>ratings.  The pools are different.  The values are different.  They mean nothing
>>outside their own pool...
>>
>
>They only differ by and added constant, letting a designated group of programs
>get a rating in both pools, we could calibrate the scales to give the group the
>same average.
>Same spread (if same formula) plus same average -> I'd say same scale.
>
>What am I missing here, isn't it that simple?

No.  This falls under sampling theory...  The "why" is a complex question to
answer.  But averages of averages is not useful...



>
>-S.
>
>>>-S.
>>>>



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.